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Abstract 

We bound the time it takes for a group of birds to reach steady state in 
a standard flocking model. We prove that (i) within single exponential time 
fragmentation ceases and each bird settles on a fixed fiying direction; (ii) the 
flocking network converges only after a number of steps that is an iterated 
exponential of height logarithmic in the number of birds. We also prove 
the highly surprising result that this bound is optimal. The model directs 
the birds to adjust their velocities repeatedly by averaging them with their 
neighbors within a fixed radius. The model is deterministic, but we show that 
it can tolerate a reasonable amount of stochastic or even adversarial noise. 
Our methods are highly general and we speculate that the results extend 
to a wider class of models based on undirected fiocking networks, whether 
defined metrically or topologically. This work introduces new techniques of 
broader interest, including the flight net, the iterated spectral shift, and a 
certain residue- clearing argument in circuit complexity. 



1 Introduction 

What do migrating geese, flocking cranes, bait balls of fish, prey-predator systems, 
and synchronously flashing fireflies have in common? All of them arc instances 
of natural algorithms, ie, algorithms designed by evolution over millions of years. 
By and large, their study has been the purview of dynamical systems theory 
within the fields of zoology, ecology, evolutionary biology, etc. The main purpose 
of this work is to show how combinatorial and algorithmic tools from computer 
science might be of benefit to the study of natural algorithms — in particular, in the 
context of collective animal behavior [20]. We consider a classical open question 
in bird flocking: bounding the convergence time in a standard neighbor-based 
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model. We give a tight bound on the number of discrete steps required for a 
group of n birds to reach steady state. We prove that, within time exponential 
in n, fragmentation ceases and each bird settles on a fixed flying direction. We 
also show that the flocking network converges after a number of steps that never 
exceeds an iterated exponential of height logarithmic in n. Furthermore, we show 
that this exotic bound is in fact optimal. If we view the set of birds as a distributed 
computing system, our work establishes a tight bound on the maximum execution 
time. Holding for a large family of flocking mechanisms, it should be thought of 
as a busy beaver type result — or perhaps busy goose. 

The bound is obtained by investigating an intriguing "spectral shift" process, 
which could be of independent interest. In the model, birds forever adjust their 
velocities at discrete time steps by averaging them with their neighbors flying 
within a fixed distance. The model is deterministic but we show that it tolerates a 
reasonable amount of stochastic or even adversarial noise. While, for concrctcncss, 
we settle on a specific geometric model, our methods are quite general and wc 
suspect the results can be extended to a large class of flocking models, including 
topological networks [1]. The only serious limitation is that the flocking network 
must be undirected: this rules out models where one bird can process information 
from another one while flying in its "blind spot." 

Bird flocking has received considerable attention in the scientific and engineer- 
ing literature, including the now-classical Boids model of Reynolds [21,26-28]. 
Close scrutiny has been given to Icadcrlcss models where birds update their ve- 
locities by averaging them out over their nearest neighbors. Two other rules are 
often added: one to prevent birds from colliding; the other to keep them together. 
Velocity averaging is the most general and fundamental rule and, understandably, 
has received the most attention. Computer simulations support the intuitive be- 
lief that, by repeated averaging, each bird should eventually converge to a fixed 
speed and heading. This has been proven theoretically, but how long it takes for 
the system to converge had remained an open problem. The existential question 
(does the system converge?) has been settled in many different ways, and it is 
useful to review the history briefly. 

A "recurrent connectivity" assumption stipulates that, over any time interval 
of a fixed length, every pair of birds should be able to communicate with each 
other, directly or indirectly via other birds. Jadbabaie, Lin, and Morse [9] proved 
the first of several convergence results under that assumption (or related ones [16, 
17,23,27]). Several authors extended these results to variable-length intervals [8, 
13, 15]. They established that the bird group always ends up as a collection of 
separate flocks (perhaps only one) , each one converging toward its own speed and 
heading. Some authors have shown how to do away with the recurrent connectivity 
assumption by changing the model suitably. Tahbaz-Salehi and Jadbabaie [24], for 
example, assume that the birds fly on the surface of a torus. Cucker and Smale [7] 
use a broadcast model that extends a bird's influence to the entire group while 
scaling it down as a function of distance. In a similar vein, Ji and Egerstedt [10] 
introduce a hysteresis rule to ensure that connectivity increases over time. Tang 
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and Guo [25] prove convergence in a high-density probabihstic model. Recent 
work [1] suggests a "topological" rule for linking birds: a bird is influenced by 
a fixed number of its neighbors instead of all neighbors within a fixed distance. 
Whether the criteria are metric or topological, the bulk of work on leaderless 
flocking has assumed neighbor-based consensus rules. We are not aware of any 
bounds on the convergence time. 

Our model is a variant of the one proposed by Cucker and Smale [7], which 
is itself a holonomic variant of the classical Vicsek model [29]. Given n birds 
Bi, . . . ,Bn-, represented at time t by points . . . , in E^, the flocking 

network Gt has a vertex for each bird and an edge between any two of them within 
distance 1 of each other. By convention, Gt has no self-loops. The connected 
components of Gt are the flocks of the system. If di{t) denotes the number of 
birds adjacent to Bi at time t, the total number of birds within the closed unit 
disk centered at Bi is precisely di(t) + 1. 




Figure 1 : Each bird updates its velocity by averaging it with those of its neighbors within 
a unit-radius circle. 



The Model. The input consists of the initial position x(0) and velocity v{l). 
Both vectors belong to M^", for any fixed d>\. For t > 1 and 1 < « < n, 

Xi{t) = Xi{t Vi{t), 

wher^i] 

Vi{t + 1) - Vi{t) = Ci{t) {Vj{t) -Vi{t)). 

^ We denote the coordinates of a vector x{t) by Xi{t) and the elements of a matrix X{t) (resp. 
Xt) by Xij{t) (resp. {Xt)ij). 
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The self-confidence coefficients Cj(t), so named because tliey tell us how much a 
bird is influenced by its neighbors, are normalized so that < Ci{t)di{t) < 1. (See 
"Discussion" section below for an intriguing interpretation of these constraints.) 
We assume that Ci{t) may vary only when Gt does; in other words, while all 
neighborly relations remain the same, so do the self-confidence coefficients. A 
natural choice of coefficients is the one used in the classical Vicsek model [29]: 
Ci{t) = {di{t) + but we do not make this restrictive assumption here. 

The model captures the simple intuition that, in an effort to reach consensus 
by local means, each bird should adjust its velocity at each step so as to be a 
weighted average of those of its neighbors. A mechanical interpretation sees in the 
difference Vi{t + 1) — Vi(t) the discrete analogue of the bird's acceleration, so that, 
by Newton's Law, F = ma, a bird is subject to a force that grows in proportion to 
the differences with its neighbors. A more useful take on the model is to view it 
as a diffusion process: more precisely, as the discrete version of the heat equation 

^ = -CtLtv, 
ot 

where the Laplacian Lt of the flocking network Gt is defined by: 



di{t) if i = j; 
-1 if(i,i)GGt; 
else. 



and Ct = diagc(t) is the self-confidence matrix. Thus we express the dynamics of 
the system as 

v{t + 1) - v{t) = -GtLtv{t) . 

This is correct in one dimension. To deal with birds in d-space, we use a standard 
tensor lift. Here is how we do it. We form the velocity vector v{t) by stacking 
vi{t), . . . ,Vn{t) together into one big column vector of dimension dn. Given a 
matrix A, the producl|^(A(8)/rf)w(t) interlaces into one vector the d vectors obtained 
by multiplying A by the vector formed by the /c-th coordinate of each for 
k = 1 , . . . , d. The heat equation would now be written as 

v{t + 1) = (P(t) ® h)v{t) . 

where P{t) = In — CtLt. One can check directly that the transition matrix P{t) 
is row-stochastic. In the case of a 3-node path, for example, P{t) has the form: 

'l-ci(t) Cl(t) 
C2(i) 1 - 2c2(t) 
C3{t) 

^ The Kronecker A _B, product of two matrices A and B is the matrix we get if we replace 
each Uij by the block aijB. Formally, if A is m-by-n and B is p-hy-q, then the product Aigi B 
is the mp-hy-nq matrix C such that Cip+j^kq+i ~ cii,fe&j,i- We will often use, with no further 
mention, the tensor identity {A (g) _B)(C ® D) = AC (g) BD. 
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1 - 2c2(i) 



Q 




C2{t) C2{t) 



Figure 2: A 3-node flock with the transitions of the middle node indicated 
by curved arrows. 



The dynamics of flocking is captured by the two equations of motion: For any 
t > 1, 

x{t) = x{t - 1) + v{t); 
v{t + l) = {P{t)^Id)v{t). 

For tie-breaking purposes, we inject a tiny amount of hysteresis into the sys- 
tem. As we discuss below, this is necessary for convergence. Intuitively, the 
rule prevents edges of the flocking network from breaking because of microscopic 
changes. Formally, an edge of Gt remains in Gt+i if the distance between Bi 
and Bj changes by less than > between times t and t + 1. We choose eh to be 
exponentially small for illustrative purposes only; in fact, virtually any hysteresis 
rule would work. 

The Results. To express our main result, we need to deflne the fourth level of 
the Ackermann hierarchy, the so-called "tower-of-twos" function: 2 || 1 = 2 and, 
for n>l, 2||n = 2'^^'^^"'~^\ The bird group is said to have reached steady state 
when its flocking network no longer changes. All the results below hold in any 
fixed dimension d > 1. 

• A group ofn birds reaches steady state in fewer than 2 || (4 log n) steps. The 
maximum number of switches in the flocking network of n birds is at most 
j^O(n ) _ ji^g limit configuration of each bird Bi is of the form a-i + bit, where 
ai,bi are d-dimensional rational vectors. After the fragmentation breakpoint 
tf = n*^*^" \ network edges can only appear and never vanish. 

• There exists an initial configuration of n birds that requires more than 2 || 
log ^ steps before reaching steady state. The lower bound holds both with and 
without hysteresis. 

Past the fragmentation breakpoint, the direction of each bird is essentially 
fixed, so ^ is effectively the bound for physical convergence. (Of course, 

damped local oscillations typically go on forever.) Combinatorial convergence 
is another matter altogether. It might take an extraordinarily long time before 
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the network stops switching. The tower-of-twos' true height is actuahy less than 
41ogn, ie, a Httle better than stated above: specificahy, the factor 4 can be re- 
placed by (log xo)~^, where xq is the unique real root of — — 1, which is about 
3.912. 



fragmentation merge phase steady state 



• • — 



^ - ^02^ 41o, 



n2 41ogn 
Z ,1, 



Figure 3: Flocks cease to lose edges after the fragmentation breakpoint tf 
and can only gain new ones. The network reaches steady state after a tower- 
of-twos of height logarithmic in the number of birds. 



• How many bits? The self-confidence matrices Ct are rational with O(logn) bits 
per entry. The bound on the maximum number of network switches holds even 
if the inputs are arbitrary real numbers. Obviously, there is no hope of bounding 
the convergence time if two birds can be initialized to fly almost parallel to each 
other; therefore bounding the representation size of the input is necessary. The 
initial position and velocity of each bird are encoded as rationals over p bits. Our 
results hold for virtually any value of p. The dependency on p begins to show 
only for p > n^, so this is what we shall assume when proving the upper bound 
on the convergence time. Keep in mind that p is only an upper bound and the 
actual number of bits does not need to be this long. In fact, the lower bound 
requires only logn bits per bird. All computation is exact. The upper bouni^of 
2 II (4 logn) is extremely robust, and holds for essentially any conceivable input 
bit-size and hysteresis rule. 

• Is the lower bound pathological? Suprisingly, the answer is no. As we mentioned, 
initial conditions require only p = 0(log n) bits per bird. Our construction ensures 
that the hysteresis rule never kicks in, so the lower bound holds whether the model 
includes hysteresis or not. The flocks used for the construction are single paths, 

^ Logarithms to the base 2 are written as log while the natural variety is denoted by In. For 
convenience we assume throughout this paper that n, the number of birds, is large enough. To 
handle small bird groups, of course, we can always add fictitious birds that never interact with 
anyone. 



6 



and the matrix P{t) corresponds to a lazy random walk with probability 3 of 
staying in place. The lower bound holds in any dimension d > 0. Here are the 
initial positions and velocities for d = 1: 




n 2 o 8 

U, 3,^,, 3, . . . 



21,21 + 1,.. 



o 4 

. , n — 2, n — 3 



< 




n 



11 



0,-n 



-11 



0,n 



-11 







n 



-11 



0,-n 



-11 




V 



n 



Flocking obeys two symmetries: one translational; the other kinetic (or "relativis- 
tic," as a physicist might say). The absolute positioning of the birds is irrelevant 
and adding a fixed vector to each bird's velocity has no effect on flocking. In 
other words, one cannot infer velocity from observing the evolution of the flocks. 
Indeed, only differences between velocities are meaningful. This invariance under 
translation in velocity space implies that slow convergence cannnot be caused by 
forcing birds to slow down. In fact, one can trivially ensure that no bird speed falls 
below any desired threshold. The lower bound relies on creating small angles, not 
low speeds. (Thus, in particular, the issue of stalling does not arise.) To simplify 
the lower bound proof, we allow a small amount of noise into the system. Within 
the next n^^^^ steps following any network switch, the velocity of an m-bird flock 
may be multiplied by Im®ot, where a is the diagonal matrix with a = {ai, . . . , a^) 
along the diagonal and rational \ai\ < 1 encoded over 0(log n)-bits. The noise-free 
case corresponds to = 1. The perturbed velocity at time t should not differ 
from the original one by more than 5t = g*-^^" ^ but we allow a number of 
perturbations as large as e*^^"^^. This noise model could be enriched considerably 
without affecting the convergence bounds, but our choice was guided by simplicity. 
Note that some restrictions are necessary for convergence; trivially, noise must be 
bounded past the last switch since two flocks flying parallel to each other could 
otherwise be forced to merge arbitrarily far into the future. Switching to a noisy 
model has two benefits: one is a more general result, since the same upper bound 
on the convergence time holds whether the noise is turned on or off; the other is 
a simpler lower bound proof. It allows us to keep the initial conditions extremely 
simple. We use only logn perturbations and 6t ~ so noise is not germane to 
the tower-of-twos growth. 

• Why hysteresis? Network convergence easily implies velocity convergence, but 
the converse is not true: velocities might reach steady state while the network does 
not. Indeed, in §3.2[ we specify a group of birds that alternates forever between 
one and two flocks without ever converging. This is an interesting but somewhat 
peripheral issue that it is best to bypass, as is done in [10], by injecting a minute 
amount of hysteresis into the system. Whatever one's rule — and, as we mentioned 
earlier, almost any rule would work — it must be sound, meaning that any two birds 
at distance ever so slightly away from 1 should have the correct pairing status. 
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Note that soundness does not follow immediately from our definition of hysteresis. 
This will need to be verified. By construction, we know that any two birds within 
unit distance of each other at time t are always joined by an edge of the flocking 
network Gt- We will show that, if we set Sh = n~^^ for a large enough constant 
b, then no two birds at distance greater than 1 + are ever adjacent in Gt- 

• How robust are the bounds? The tower-of-twos bound continues to hold regard- 
less (almost) of which hysteresis rule we adopt and how many input bits we allow. 
The assumption e/j = n"**"^ is introduced for notational convenience; for example, 
they allow allow us to express soundness very simply by saying that no birds at 
distance greater than 1 + should ever be joined by an edge of the network. 
Without the assumptions above, the bounds are more complicated. For the inter- 
ested reader, here is what happens to the number N{n) of network switches and 
the fragmentation breakpoint tf, ie, the time after which flocks can only merge: 



Discussion. How relevant arc this paper's results? Why are they technically 
difficult? We address these two points briefly. Our bounds obviously say nothing 
about physical birds in the real world. They merely highlight the exotic behavior 
of the mathematical models. Although we focus on a Cucker-Smale variant, we 
believe that the bounds hold for a much wider variety of neighbor-based models. 
We introduce new techniques that are likely to be of further interest. The most 
promising seems to be the notion of a "virtual bird" flying back in time. We design 
a structure, the flight net, that combines both kinetic and positional information 
in a way that allows us to use both the geometry and the algebra of the problem at 
the same time. Perhaps the most intriguing part of this work is the identification 
of a curious phenomenon, which we call the (iterated) spectral shift. 

Self-confidence leads to an interesting phenomenon. Too much of it prevents 
consensus but so does too little. Harmony in a group seems to be helped by a min- 
imum amount of self-confidence among its members. Both extreme selfishness and 
excessive altruism get in the way of reaching cohesion in the group. Self-confidence 
provides a retention mechanism necessary for reaching agreement. The coefficient 
Ci{t)di{t) represents how much a bird lets itself influenced by its neighbors. By 
requiring that it be less than 1, we enforce a certain amount of self-confidence for 
each bird. This idea is not new and can be found in [8, 14, 15]. 

Besides noise and hysteresis, our model differs from Cucker-Smale [7] in two 
other ways. One is that our flocking networks are not complete graphs: they un- 
dergo noncontinuous transitions, which create the piecewise linearity of the system. 
Another difference is that the transition matrices of our model are not symmetric. 
This greatly limits the usefulness of linear algebra. The reason why might not be 
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obvious, so here is some quick intuition. Cucker and Smale diagonalize the Lapla- 
cian and note that, since only differences are of interest, the vectors might as well 
be assumed to lie in the space l"*-. Not only is that space invariant under the 
Laplacian but it contracts at an exponential rate set by the Fiedler number (the 
second eigenvalue). Prom this, a quadratic Lyapunov function quickly emerges, 
namely the energy v'^LfV of the system. When the graph is connected, the Fiedler 
number is bounded away from by an inverse polynomial, so differences between 
velocities decay to at a rate of 2*" " for some constant c > 0. 

In the nonsymmetric case (ours), this approach is doomed. If, by chance, all 
the transition matrices had the same left eigenvectors, then the variance of the 
time-dependent Markov chain sampled at the (common) stationary distribution 
would in fact be a valid Lyapunov function, but that assumption is completely 
unrealistic. In fact, it has been proven [9, 19] that the dynamical systems under 
consideration do not admit of any suitable quadratic Lyapunov function for n > 8. 
Worse, as was shown by Olshevsky and Tsitsiklis [19], there is not even any hope 
of finding something weaker, such as a nonzero positive semidefinite matrix A 
satisfying, for any allowable transition v{t) — v{t + 1), 



Our transition matrices are diagonalizable, but the right eigenspace for the sub- 
dominant eigenvalues is not orthogonal to 1 and the maps might not even be 
globally nonexpansive: for example, the stochastic matrix 



has the two eigenvalues 1 and 0.133; yet it stretches the unit vector (1,0) to one of 
length 1.041. Linear algebra alone seems unable to prove convergence. The ratio- 
nality of limit configurations is not entirely obvious. In fact, the iterated spectral 
shift is reminiscent of lacunary-series constructions of transcendental numbers, 
which is not the most auspicious setting for proving rationality. This work draws 
from many areas of mathematics and computer science, including Markov chains, 
nonnegative matrices, algebraic graph theory, elimination theory, combinatorics, 
harmonic analysis, circuit complexity, computational geometry, and of course lin- 
ear algebra. 



To establish a tight bound on the convergence time, we break down the proof 
into four parts, each one using a distinct set of ideas. We briefly discuss each 
one in turn. The first step is to bound the number of network switches while 
ignoring all time considerations. This decoupling allows us to treat the problem 



Al = 0; 

v{t + l)'^Av{t + 1) < v{tfAv{t). 




2 A Bird's Eye View of the Proof 
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as purely one of information transfer. In one step a bird influences each one of 
its neighbors by forcing its velocity into the computation of these neighbors' new 
velocities. This influence propagates to other birds in subsequent steps in a manner 
we can easily trace by following the appropriate edges along the time-dependent 
flocking network. Because of self-confidence, each bird influences itself constantly. 
It follows that once a bird influences another one (directly or indirectly via other 
birds) it does so forever, even if the two birds find themselves forever confined to 
distinct connected components. For this reason, influence alone is a concept of 
limited usefulness. We need another analytical tool: refreshed influence. Suppose 
that, at time Iq, Bi claims influence on 82- As we just observed, this claim will 
hold for all t > Iq. But suppose that we "reboot" the system at time to + 1 and 
declare all influences void. We may now ask if Bi will again claim influence on 
B2 at some time t > to in the future: in other words, whether a chain of edges 
will over time transfer information again from Bi to B2 after to. If yes, we then 
speak of refreshed influence. Suppose now that Bi exerts refreshed influence on 
B2 infinitely often: we call such influence recurrent. Although influence is not a 
symmetric relation, it is an easy exercise to prove that recurrent influence is. 




Figure 4: Each bird is influenced by the one pointing to it. If this chain of 
influence occurs repeatedly (not necessarily with the same set of intermediate 
birds) , a backward sphere of influence centered at the end of the chain will 
begin to propagate backwards and eventually reach the first bird in the chain. 



This appears to be a principle of general interest. If political conversations 
consist of many two-way communications between pairs of people, with the pairs 
changing over time, then the only way A can influence B repeatedly is if it is 
itself influenced by B repeatedly. What makes this fact interesting is that it holds 
even if A and B never exchange opinions directly with each other and only a 
single pairwise communication occurs at any given time. Self-confidence plays 
an important role in this phenomenon. It provides information retention that 
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prevents agents from being influenced by their own opinions in periodic fashion. 
In fixed networks, this avoids the classical oscillation issue for random walks in 
bipartite graphs. 

In time-dependent networks, the role of self-confidence is more subtle. To 
understand it, one must first remember one fundamental difference between fixed 
directed and undirected consensus networks (ie, where at each step, the opinion 
at each node v is averaged over the opinions linked to by the edges from v). In 
a fixed directed network, the fraction of an agent's opinion that is measurable at 
some other node of the network might be exponentially small in the time elapsed 
since that opinion was expressed. This cannot happen in undirected networks: any 
fraction of an opinion is either or bounded from below independently of time. 
Time-dependent undirected networks, on the other hand, are expressive enough 
to (essentially) simulate fixed directed ones: time, indeed, can be used to break 
edge symmetry. The benefits of undircctcdncss are thus lost, and time-dependent 
undirected consensus networks can behave much like fixed directed ones — see [5,6] 
for an application of this principle to interactive proof systems; in particular, 
they can witness exponential opinion propagation decay. Adding self-confidence 
magically prevents such decay. The idea would appear to warrant special scrutiny 
outside of its native habitat of computer science and control theory. 

• How many switches? Suppose that Bi exerts recurrent influence on 82- We 
show that, at some point, both birds will join a connected component of the 
flocking network and remain there forever. How many switches can occur before 
that event? Let Vi be the set of birds influenced by Bi. As soon as everyone in 
Vi has been influenced by Bi, let's "reboot" the system and define V2 to be the 
new set of birds with refreshed influence from Bi. Obviously Fi D V2. Repeating 
this process leads to an infinite nested sequence 

where Voo contains at least the two birds Bi and B2- Let be the formation time 
of Vfe and let 5/- be the difference in velocity between the two birds at time T^. 
We wish we could claim a uniform bound, \\Sk\\2 < (1 — £)||<5jfc-i||2, for some fixed 
e > independent of the time difference — T^._i. Indeed, this would show that, 
for k large enough, the two velocities are close enough for the hysteresis rule to 
kick in and keep the two birds together in the same flock forever. Of course, since 
the two birds need not be adjacent, this argument should be extended to all pairs 
of birds in Voo- While the inequality ||5a,.||2 < (1 — £)||'^fc-i||2 is too much to ask 
for, we show that ||5a;||2 < Cfe) where < (1 — £)Ck-i- In other words, the velocity 
difference between Bi and B2 may not shrink monotonically, but it is bounded by 
a function that does. The uniformity of the shrinking, which is crucial, depends 
critically on self-confidence and the retention mechanism it implies. Technically, 
this translates into a uniform lower bound on the nonzero entries of products of 
stochastic matrices. This allows us to rescue the previous argument and bound 
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the value of k such that Vk = V^o- To bound the number of switches before time 
Tjfc, we need to find how many of them can take place between a reboot at Tj^i 
and the formation of Vj. The key observation is that Vj is formed by a growth 
process of smaller flocks (ie, all of them of size less than n): we can therefore set 
up a recurrence relation and bound the number of switches inductively. 

• How much time between switches? Flock behavior between switches is linear, 
so spectral analysis provides most of the tools we need to bound the inter-switch 
time. At time t, the number of bits needed to encode the velocity is (roughly) 
0{t). This means that, in the worst case, two birds can fly either parallel to 
each or at an angle at least e~^^^\ From this we can infer that, should the 
birds want to be joined together in the flocking network after time t, this union 
must happen within a period of e*^^*^. Things are more complex if the stationary 
velocities of the two flocks are parallel. We need to use root separation bounds for 
various extension fields formed by adjoining to the rationals all the relevant eigen- 
information. Intuitively, the question we must answer is how long one must wait 
for a system of damped oscillators to cross a given real semi-algebraic set with 
known parameters. All of these techniques alone can only yield a convergence 
time bound in the form of a tower-of-twos of height exponential in n. To bring 
the height down to logarithmic requires two distinct ideas from computational 
geometry and circuit complexity. 

• How to bring the height down to linear? So far, we have only used combinatorics, 
algebraic graph theory, linear algebra, and elimination theory. We use algorithmic 
ideas from convex geometry to reduce the height to linear. We lift the birds into 
4 dimensions (or (i+ 1 in general) by making time into one of the dimensions. We 
then prove that, after exponential time, birds can only fly almost radially (ie, along 
a line passing through the origin) . This implies that, after a certain time threshold, 
flocks can only merge and never fragment again. From that point on, reducing 
the height of the tower-of-twos to linear is easy. Our geometric investigation 
introduces the key idea of a virtual bird. The stochastic transitions have a simple 
geometric interpretation in terms of new velocities lying in the convex hulls of 
previous ones. This allows us to build an exponential-size flight net consisting of 
convex structures through which all bird trajectories can be monitored. A useful 
device is to picture the birds flying back in time with exactly one of them carrying 
a baton. When a bird is adjacent to another one in a flock, it may choose to pass its 
baton. The trajectory of the baton is identified as that of a virtual bird. Because 
of the inherent nondctcrminism of the process, we may then ask the question: 
is there always a virtual bird trajectory that follows a near-straight line? The 
answer, obviously negative in the case of actual birds, turns out to be yes. This is 
the benefit of virtuality. This fact has numerous geometric consequences bearing 
on the angular flight motion of the real birds. 

• How to bring the height down to logarithmic? It is not so easy to build intuition 
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for the logarithmic height of the tower-of-twosj_J A circuit complexity framework 
helps to explain the residue clearing phenomenon behind it. To get a tower-of- 
twos requires an iterated spectral shift. When two flocks meet, energy must be 
transferred from the high-frequency range down to the lowest mode in the power 
spectrum. This process builds a residue: informally, think of it, for the purpose 
of intuition, as residual heat generated by the transfer. This heat needs to be 
evacuated to make room for further spectral shifts. The required cooling requires 
free energy in the form of previously created spectral shifts. This leads to an 
inductive process that limits any causal chain of spectral shifts to logarithmic 
length. The details are technical, and the best way to build one's intuition is to 
digest the lower bound first. 

• How to prove the optimality of the logarithmic height? The starting configu- 
ration is surprisingly simple. The n birds stand on a wire and fly off together at 
various angles. The initial conditions require only O(logn) bits per bird. The n 
birds meet in groups of 2, 4, 8, etc, forming a balanced binary tree. Every "colli- 
sion" witnesses a spectral shift that creates flying directions that are increasingly 
parallel; hence the longer waits between collisions. To simplify the calculations, 
we use the noisy model to flip flocks occasionally in order to reverse their flying 
directions along the X-axis. This occurs only logn times and can be fully ac- 
counted for by the model we use for the upper bound. Because the flocks are 
simple paths, we can use harmonic analysis for cyclic groups to help us resolve all 
questions about their power spectra. 



3 The Upper Bound 



We begin with a few opening observations in ^3.1 We explore both the algebraic 



and geometric aspects of flocking in { 3.2 We establish a crude convergence bound 



in ^3.3, which gives us a glimpse of the spectral shift. An in-depth study of its 
combinatorial aspects is undertaken in §3.4[ from which a tight upper bound 
follows. We shall always assume that p > n'^. To highlight the robustness of the 
bounds, we leave both p and eh as parameters throughout much of our discussion, 
thus making it easier to calculate convergence times for arbitrary settings. For 



convenience and clarity, we adopt the default settings below in ^3.4 (but not 
before). One should keep in mind that virtually any assignment of parameters 
would still produce a tower-of-twos. Let h denote a large enough constant: 

{p = ffi' 
-b 3 



*As a personal aside, let me say that I acquired that intuition only after I had established the 
matching lower bound. For this reason, I recommend reading the lower bound section before the 
final part of the upper bound proof. 
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Recall that p and Eh denote, respectively, the input bit-size and the hysteresis 
parameter. With these settings, the fragmentation breakpoint and the maximum 



3.1 Preliminaries 

We establish a few useful facts about the growth of the coordinates over time. It 
is useful to treat coordinates as integers, which we can do by expressing them as 
fractions sharing the same denominator. For example, the initial positions and 
velocities can be expressed either as p-bit rationals or, more usefully, as 0(pn)- 
bit CD-rationals, ie, rationals of the form pi/q, with the common denominator 
q. We mention some important properties of such representations. We will also 
introduce some of the combinatorial tools needed to measure ergodicity. The 
objective is to predict how fast backward products of stochastic matrices tend to 
rank-one matrices. We treat the general case in this section and investigate the 
time-invariant case in the next. 

Numerical Complexity. The footprint of a matrix A is the matrix A derived 
from A by replacing each nonzero entry by 1. For t > s,we use P{t, s) as shorthand 
for P{t)P{t — 1) ■ ■ ■ P{s). Note that, in the absence of noise, the fundamental 
equation ([T| can be rewritten as 



A bird may influence another one over a period of time without the converse being 
true; in other words, the matrices P{t, s) and s) are in general not symmetric; 
the exception is P{t), which not only is symmetric but has its diagonal full of ones. 
Because of this last property, P_{t, s) can never lose any 1 as t grows, or to put it 
differently the corresponding graph can never lose an edge. Before we get to the 
structural properties of P{t, s), we need to answer two basic questions: how small 
can the nonzero entries be and how many bits do we need to represent them? 
As was shown in [8, 14] , nonzero elements of P(t, s) can be bounded uniformly, 
ie, independently of t. Note that this relies critically on the positivity of the 
diagonals. Indeed, without the condition Ci{t)di{t) < 1, we could choose P{t) = A 
for even t and P{t) = B for odd t, where 



switch count are both vP^'^^\ 



v{t + l) = {P{t,l)(^h)v{l). 





For even t > 




2-*/2 I _ 2i-t/2 2 
1 
1 
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To understand this process, think of a triangle with a distinguished vertex called 
the halver. Each vertex holds an amount of money. At odd steps, the halver 
splits its amount in half and passes on each half to its neighbor; the other ver- 
tices, meanwhile, pass on their full amount to the halver. The total amount of 
money in the system remains the same. At the following (even) step, the role of 
halver is handed to another vertex (which one does not matter); and the process 
repeats itself. This alternate sequence of halving and relabeling steps produces an 
exponential decay. If each vertex is prohibited to pass its full amount, however, 
then money travels while leaving a "trace" behind. As we prove below, exponen- 
tial decay becomes impossible. This prohibition is the equivalent of the positive 
self-confidence built into bird flocking. 

Lemma 3.1. For any I < s < t, the elements of P{t,s) are CD-rationals over 
0{{t — s+ l)nlogn) bits. The nonzero elements are in n"*^^"' \ 

Proof. Each row of P{t) contains rationals with the same 0(logn)-bit denomi- 
nator, so the matrix P{t) can be written as times an integer matrix, where 
both N and the matrix elements are encoded over O(nlogn) bits. Each ele- 
ment of P{t, s) is a product of t — s + 1 such matrices; hence a matrix with 
0((t — ■s + l)nlogn)-bit integer elements divided by a common 0((t — s + l)nlogn)- 
bit integer. For the second part of the lemma, we use arguments from [8,14]. Recall 
that P{t) = In — CfLt, where Ct is a diagonal matrix of positive rationals encoded 
over 0(log n) bits, so the case t = s is obvious. Let p{t, s) be the smallest positive 
element of P(t, s) and suppose that t > s. 

We begin with a few words of intuition. Because P{s,t) = P{t)P{t — l,s), a 
nonzero entry Pij{t, s) is the expected value of Pkj{t— 1, s), for a random k adjacent 
to i in P{t), or, to be more precise, in the graph induced by the nonzero elements 
of that matrix. If, for all such k, Pkj{t — 1, s) > 0, then Pij{t, s), being an average 
of positive numbers, is at least p{t — 1, s), and we are done. On the other hand, 
having some Pkj{t — l,s) equal to means that the edge {k,j) is missing from 
the "graph" P(t — l,s). If we now consider the 2-edge path formed by {k,i) in 
Pit) and (i,j) in P(t — l,s), we conclude that at least one of (i,j) or {k.,j) is a 
brand-new edge in P(t, s). We then use the fact that such events happen rarely. 

• Suppose that Pkj{t — l,s) > for each i.j,k such that Pij{t, s)pikit) > 0. 
Then, for any Pij{t, s) > 0, by stochasticity, 

Pij{t,s) = ^pikit)pkjit-l,s) > (^pikitj^pit-l,s) = p{t-l,s). 

k k 

It follows that p{t, s) > p{t — 1, s). 

• Assume now that Pij{t, s)pik{t) > and pkj{t — l,s) =0 for some i,j,k. 

Since Pij{t, s) is positive, so is Pii{t)pij{t — 1, s) for some I] hence Pij{t, s) > 
Pil{t)pij{t — 1, s) > p{t — 1, s)n~'-^^^K We show that this drop coincides with 
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the gain of an 1 in P{t, s). The footprint of P{t) is symmetric, so Pki{t) > 
and hence 

Pkj{t,s) = ^Vkl{t)Plj{t - > Pki{t)pij{t - l,s) > n~'^^'^'^pij{t - l,s). 

We distinguish between two cases. If Pij{t — is positive, then so is 
Pkj{t,s). Since pkj{t — l,s) = 0, the matrix P{t,s) has at least one more 
positive entry than P(t — l,s); recall that no entry can become null as we 
go from P{t — l,s) to P(t,s). On the other hand, if Pij{t — l,s) = 0, our 
assumption that pij{t, s) > leads us to the same conclusion. In both cases, 
P{t, s) differs from P{t — 1, s) in at least one place: this cannot happen more 
than times. 

If we fix s then p{t, s) > p{t — l, s) for all but at most values of t. For the others, 
as we saw earlier, pij{t, s) > p{t — 1, s)n~'^^^^; hence p{t, s) > p{t — 1, s)n~'^*-^^ □ 

The coordinates of f (1) and x(0) can be expressed as CD-rationals over 0{pn) 
bits. By the previous lemma, this implies that, in the noise-free case, for t > 
1, v{t) = {P{t — 1,1) (X" Id)v{l) is a vector with CD-rational coordinates over 
0{tnlogn -\- pn) bits. The equation of motion ([T| yields 

x{t) = x(0) + ((P(t - 1, 1) + • • • + P(l, 1) + /„) Id)v{l). 

Note that P{t — 1,1) = N^^Q, where Q is an integer matrix with 0{tnlogn)- 
bit integer elements and N is an 0(fn log n)-bit integer. The other matrices are 
subproducts of P{t — 1, 1) = P(t — 1) • • ■ P(l), so we can also express them in 
this fashion for the same value of N. It follows that v{t) and x{t) have CD- 
rational coordinates over 0{tnlogn + pn) bits. Adding noise makes no difference 
asymptotically. Indeed, bringing all the coordinates of the scaling vectors a in 
CD-rational form adds only O(nlogn) bits to the velocities at each step. 

Lemma 3.2. For anyt > 1, the vectors v{t) andx{t) have CD-rational coordinates 
over 0{tn log n -\- pn) hits. 

The £oo norm of the velocity vector never grows, as transition matrices only 
average them out and the noise factors are bounded by 1: since p > n^, it follows 
that, for any t > 1, 

II^WIl2 = 2°(P). (3) 

Ergodicity. Ignoring noise, the fundamental motion equation ([T]) gives the po- 
sition of the birds at time t > 1 as x{t) = x{{}) -\- {P*{t — 1) I(i)v{l), where 

P*{t) = P(l) + P(2)P(1) + P(3)P(2)P(1) + • • • + P{t) ■ ■ ■ P(2)P(1). 
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Products of the form P{t) ■ ■ ■ P{1) appear in many applications [22], including the 
use of colored random walks in space-bounded interactive proof systems [5,6]. One 
important difference is that random walks correspond to products that grow by 
multiplication from the right while the dynamics of bird flocking is associated with 
backward products: the transition matrices evolve by multiplication from the left. 
This changes the nature of ergodicity. Intuitively, one would expect (if all goes 
well) that these products should look increasingly like rank-1 matrices. But can 
the rows continue to vary widely forever though all in lockstep (weak ergodicity), 
or do they converge to a fixed vector (strong ergodicity)? The two notions are 
equivalent for backward products but not for the forward kind [22]. Here is an 
intuitive explanation. Backward products keep averaging the rows, so their entries 
themselves tend to converge: geometrically, the convex hull of the points formed 
by the row keeps shrinking. Forward products lack this notion of averaging. For 
a simple illustration of the difference, consider the three stochastic matrices: 

. 1 A A o 1 0\ ^1/31 

Backward products are given by the simple formula, 

• • • ABABABAB = C , 

n 

for all n > 1. On the other hand, the forward product tends to a rank-one matrix 
but never converges: 



ABABABAB ■ • • 



C even n > 1; 
A odd n > 0, 




Figure 5: Premultiplying a matrix, whose rows are shown as points, by a 
stochastic matrix P{t) shrinks its convex hull. 



As we just mentioned, the key to ergodicity for backward products resides 
in the convex hull of the rows. We introduce a family of metrics to measure its 
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"shrinkage." For any p > 1, let Tp{A), the ergodicity coefficient of denote the 
£p-diameter of the convex hull formed by the rows of a matrix A, ie, 

Tp^A^ = max 11^^^ ^j^llp, 

where Ai^ denotes the i-ih row of A. From the fact that Ip is a metric space for 
p > 1, it follows by convexity that the diameter is always achieved at vertices of 
the convex hull. We extend the definition to p = 1 but, for reasons soon to be 
apparent, it is important to keep the coefficients between and 1, so we divide 
the diameter by two, ie. 



Ti(^) = - max lojfc - ajk\- 



k 

To understand why Tp{A) relates to ergodicity, assume that A is row-stochastic. 
We observe then that 

< Ti{A) = 1 — min N min {aik , ajk} < 1. 
k 

This follows from the fact that the distance |a — 6| between two numbers a, 6 is 
twice the difference between their average and the smaller one. There are many 
fascinating relations between these diameters [22]. For our purposes, the following 
submultiplicativity result will suffice [14] |^ 

Lemma 3.3. Given two row- stochastic matrices A,B that can be multiplied, 

T2{AB) < Ti{A)t2{B). 

Proof. Fix the two rows i,j that define T2{AB), and let a = 1 — ^^i. minjajfc, fljfc}. 
Note that < a < ti{A). If a = 0, then Ai^, = Aj^ and T2{AB) = 0, so the lemma 
holds trivially. Assuming, therefore, that a > 0, we derive 

T2{AB) = ||^ajfci?fc^, - ajkBk* 

k k 



|^(ajfc - min{aik,ajk])Bk* - ^(ajfc - m.m.{aik,ajk])Bk, 

k k 

- "^1(^)11^ ~ m.m.{aik,ajk])Bk* - ^^^i^jk - m.\n{aik,ajk])Bk* 

k k 

Observe now that the coefficients a~^{aik — min{ajfc, ajfc}) are nonnegative and 
sum up to 1, so the corresponding sum is a convex combination of the rows of B. 
The same is true of the other sum; so, by convexity, the distance between any two 
of them cannot exceed T2{B). □ 



^ Submultiplicativity is not true for T2 in general. First, to make the notion meaningful, 
we would need to normalize it and use T2 = T2/\/2 instead, to ensure that T2{A) < 1 for any 
stochastic A. Unfortunately, r2 is not submultiplicative, as we easily check by considering a 
regular random walk A on K2.2 and checking that T2{A'^) > T2{A)^. 
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Figure 6: Tp{A) is the £p-diameter of the convex hull of the rows of A. 



Displacement. For future use, we mention an elementary relation between bird 
distance and velocity. The relative displacement between two birds Bi and Bj is 
defined as A.ij{t) = \DiSTt{Bi, Bj) — DiSTt-iiBi, Bj) |, where the distance between 
two birds is denoted by i)iSTt{Bi,Bj) = \\xi{t) — Xj{t)\\2- 

Lemma 3.4. For t > 1, Aij{t) < \\vi{t) - Vj{t)\\2. 

Proof. By the triangle inequality, 

\\x^{t)-Xj{t)\\2 < \\xi{t-l)-Xj{t-l)\\2+\\xi{t)-Xi{t-l)-{Xj{t)-Xj{t-l))\\2. 

Reversing the roles of t and t — 1 gives us a similar inequality, from which we find 
that 

\DlSTt{Bi,Bj) - DlSTt_i{Bi,Bj) I < \\xi{t) - Xi{t - 1) - {Xj{t) - Xj{t - 1))||2 . 

□ 



3.2 The Algebra and Geometry of Flocking 

To separate the investigation of network switches from the time analysis is one of 
the key ideas of our method. Our first task, therefore, is to bound the number 
of times the flocking network can change, while ignoring how long it takes. Next, 
we investigate the special case of time-invariant networks. In the worst case, the 
pre-convergence flying time vastly exceeds the number of network switches, so 
it is quite intuitive that a time-invariant analysis should be critical. Our next 
task is then to prove the rationality of the limit configuration. We also show 
why the hysteresis rule is sound. We follow this with an in-depth study of the 
convex geometry of flocking. We define the flight net, and with it derive what 
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is arguably our most versatile analytical tool: a mathematical statement that 
captures the intuition that flocks that hope to meet in the future must match 
their velocities more and more closely over time. To do this we introduce the 
key concept of a virtual bird, which is a bird that can switch identities with its 
neighbors nondeterministically. 

Counting Network Switches. Let N{n) be the maximum number of switches 
in the flocking network, ie, the number of times t such that P{t) ^ P{t + 1). 
Obviously, A^(l) = 0; note that, by our requirement that Ct may vary only when 
Gt does, we could use footprints equivalently in the definition. For the sake of our 
inductive argument, we need a uniform bound on N{n) over all initial conditions. 
Specifically, we define N{n) as the largest number of switches of an n-bird flocking 
network, given arbitrary initial conditions: for the purpose of bounding N{n), 
x(0) and v{l) are any rea/ vectors, with ||t;(l)||2 = 2*-^^^-'. This involves building 
a quantitative framework around the existential analyses of [8, 13-15]. We now 
prove the network switching bound claimed in the "Results" section of |T} 

Lemma 3.5. The maximum number N{n) of switches in the flocking network is 
bounded by n'^(" )(p + log ^)"~^. 

Corollary 3.6. Under the default settings N{n) = 

Proof of Lemma \3.^ We begin with the noise-free model. Fix s > once and 
for all. For t > s, let N(t, s) be the number of network changes between times 
s and t, ie, the number of integers u {s < u < t) such that P{u) ^ P{u — 1). 
Since the diagonal of each P{t) is positive, P{t, s) can never lose a 1 as t grows, 
so there exists a smallest Ti such that P{t, s) = P{Ti, s) for all t > Ti. Consider 
the first column and let no < • • • < n/^ < n be its successive Hamming weights 
(ie, number of ones); because pu{s) ^ 0, uq > 1. We define t^ as the smallest 
t > s such that the first column of P{t,s) acquires weight n^. Note that to = s 
and ti-^ < Ti. How large can N{tk+i,tk) be, for < /c < li? Let F denote the 
subgraph of Gt/^+i consisting of the connected components (ie, flocks) that include 
the nfc birds indexed by the first column of P{ti:,s). Intuitively, at time t^ + 1, 
bird Bi can claim it has had influence over the birds since time to. At time 
tk + 2, this influence will spread further to the neighbors of these birds in F. 
Note that having been influenced by Bi in the past does not imply connectivity 
among the birds. 

• If i*" contains more than birds then, at time t^ + 1, at least one of these 
extra birds, Bi, is adjacent in Gtj._,_i to one of the birds, say, Bj. Then, 
Pijitk + 1) > and pji{tk,s) > 0; hence (t^ + 1 , s) > Pij{tk + l)pji{tk, s) > 
0. Since Bi is not one of the birds, puit^, s) = and the first column of 
Pjyt, s) acquires a new 1 between tk and + This implies that t^+i = + 1 
and N{tk+i,tk) < 1. 
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• Assume now that F has exactly Uk vertices. The flocking network G^^+i 
consists of a set of flocks totalling birds and a separate set of flocks 
including the n — others. The next N{nk) + N{n — n^) + 1 network 
switches must include one between the two sets, since by then we must run 
out of allowable "intra-switches." It follows by monotonicity of N{n) that 

N{tk+i,tk) < 1 + N{nk) + N{n - n^) < 2N{n - 1) + 1. 




Figure 7: The white birds have all been influenced by Bi. on the left, they 
propagate that influence at the next step; on the right, they have to wait for 
flocks to join together before the influence of Bi can expand further. 



In both cases, N{tk-\-i,tk) < 2N{n — 1) + 1, so summing over all < /c < li, 

h-i 

N{ti^,s) = ^N{tk+i,tk) <2nN{n-l) + n. 

Of course, there is nothing special about bird Bi. We can apply the same argument 
for each column and conclude that the time Ti when the matrix P(t, s) has finally 
stabilized satisfies 

N{Ti,s) <2nN{n-l) +n. (4) 

The index set Vi corresponding to the ones in the first column of P(Ti, s) is called 
the first stabilizer. For t > Ti, no edge of Gt can join Vi to its complement, since 
this would immediately add more ones to the first column of P{t, s). This means 
that Bi can no longer hope to infiuence any bird outside of Vi past time Ti. 

Relabel the rows and columns so that all the ones in P(Ti,s)'s first column 
appear on top. Then, for any t > Ti, P{t) is a 2-block diagonal matrix with the 
top left block, indexed by Vi x Vi, providing the transitions among the vertices 
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of Vi at time t. This is a restatement of our observation regarding Gt and Vi. 
Here is why. Since the footprint of P{t) is symmetric, it suffices to consider the 
consequence of a nonzero, nondiagonal entry in P{t), ie, Pij{t) > 0, with i ^ Vi 
and j G Vi . This would imply that 

Pii{t,s) > pij{t)pji{t - > 0, 

and hence that i £Vi, a contradiction. Being 2-block diagonal is invariant under 
composition, so P{t,Ti + 1) is also a matrix of that type. Let AivxVK denote 
the submatrix of A with rows indexed by V and columns by W. Writing Vq = 
{l,...,n}, for i >ri, 

-P|yixyo(*'*) = P\VixVi{'t,Ti + l)PiVixVoiTi,s). 

By setting s to Ti + 1 we can repeat the same argument, the only difference being 
that the transition matrices are now |Vi|-by-|Vi|. This leads to the second stabilizer 
V2 ^Vi, which, by relabeling, can be assumed to index the top of the subsequent 
matrices. We define T2 as the smallest integer such that P\VixVii'^^'^^ + 1) = 
— |VixVi(^2; ?! + 1) for all t > T2. The set V2 indexes the ones in the first column 
of P|y^xVi(^25 ^1 + 1). Iterating in this fashion leads to an infinite sequence of 
times Ti <T2 < ■ ■ ■ and stabilizers 2 V2 2 • • • such that, for any t > T^, 

P\VkxVo{t,s) = P\VkxVk{t^Tk + l)P\Vky,Vk-iiTk,Tk-i + 1) 

• • ■ P\V2xVi{T2,Ti + l)P\VixVo{Ti,To + 1), 

where P^Y-y^Y-_.^(Ti,Ti-i + l) is a |T^|-by-|T^-i| matrix and To = s — 1. The stabiliz- 
ers are the sets under refreshed influence from Bi . We illustrate this decomposition 
below: 



/2 0\ /II 0\ /I 0\ 

^=1 1 1 ^=5 1 1 C= 1 . 

\0 11/ \0 2/ \0 1/ 

Consider the word M = CB^C ABABA. The matrix M^y^y^Vo is factored as 

^1 Ve X -^1 v% X y4 -^1 ^4 X ^3 (-^^) I v;^ X V2 I ^2 X Vi {ABA) \ x Vq > 

where Vq = Vi = V2 = {1,2,3}, 1/3 = 1/4 = 1/5 = {1,2} and Vq = {1}. The 
factorization looks like this: 




with the infinite nested sequence 



Vi = {1,2,3} D {1,2,3} D {1,2} D {1,2} D {1,2} D {1} D {1} D {1} • • • 
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What is the benefit of rewriting the top rows of P(t, s) in such a comphcated 
manner? The first column of each P|y.xv;_i(^i) ^i-i + 1) consists entirely of pos- 
itive entries, so the submultiplicativity of the ergodicity coefficients implies rapid 
convergence of the products toward a rank-one matrix. This has bearing on the 
relative displacement of birds and groupings into flocks. By Lemma 3.1 each en- 
try in the first column of each P|y.xy._i T,_i -|- 1) is at least n~^^"'^\ so half the 

^i-distance between any two rows is at most 1 — n~'-^^"'^^ < e~" therefore 



Lemma 



3.3 



implies that T2{A) < ti{A)t2{I) < \/2ti{A), and 



k 

i=i (5) 

Let xihj) denote the n-dimensional vector with all coordinates equal to 0, except 
for x{hj)i = 1 and x[hj)j = ~1- Note that 

Vi{t) - vj{t) = {{x{hj)P{t - 1, 1)) h)v{l); 

therefore, by Cauchy-Schwarz and Q, 

\\Vi{t) - Vj{t)h < ^T2{P{t - l,l))|b(l)||2 < T2{P{t - 1,1))2^(P). (6) 

If we restrict i,j to Vfc, we can replace P{t — 1, 1) by P|y^xVb(^ ~ ^) write 

\\v^{t) - Vjm2 < r2(P\V,xV,{t - 1, 1))20(P). 

Setting k = ■n!'^'^^ \p + \og ^] for a large enough integer constant ho > 0, we derive 
from ([5]) that, for any t > + 1, 



max \\vi{t) - vj{t)\\2 < e-'=--°^"''+0(P) < en. . (7) 



By Lemma 3.4 it then follows that Ajj(t) < Sh. By the hysteresis rule, this means 
that if birds Bi and Bj are joined after time T^ + l, they will always remain so. This 
leaves at most ('^''') extra network changes (final pairings), so the total number 
is conservatively bounded by 

iV(Tfe,Tfc_i) + --- + iV(Ti,l) + 

But @ holds for any pair (Ti,Ti^i -|- 1), so 



\Vk\ 
2 



N{n) < k{2nN{n - 1) + n) + 
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Since A^(l) = 0, for all n > 1, 

N{n) = n^("')(p + logi)"-i. 

There is a technical subtlety we need to address. In the inductive step defining 
N{n — 1), and more generally N(n') for n' < n, the initial conditions and element 
sizes of the transition matrices should be treated as global parameters: they de- 
pend on n, not n'. In fact, it is safe to treat n as a fixed parameter everywhere, 
except in the recurrence Q. The key observation is that, as n' decreases, the 
bounds provided by (5) and in the setting of k = n''o" (p + log ^) still provide 
valid — in fact, increasingly conservative — estimates as n' decreases. The noise is 
handled by reapplying the bound after each of the e*^^" •* perturbations. □ 




Figure 8: The arborescence of birds separating into groups. 



Remark 2.1. The rationality of positions and velocities was never used in the 
proof. The only requirement is that the initial velocities of the birds should have 
Euclidean norm in 2*-^^^). 

Remark 2.2. The nested sequence Vi 5 V2 5 • • • is infinite but the number 
of different subsets obviously is not. The smallest stabilizer V^, denoted V^j to 
indicate its relation to Bi , cannot be empty since a bird influences itself for ever; 
hence {1} £ V^^. If \Vk^\ > 1, then Bi influences all the birds in V^^ recurrently, 
ie, infinitely often. In fact, this is true not just of Bi but of all 14^, all of whose 
birds influence all others in that set recurrently. The sets V^j , . . . , Vk„ are therefore 



24 



pairwise disjoint or equal. This implies a partition of the bird set into recurrently 
self-influencing classes. One can model the process leading to it as an arboresccncc 
whose root corresponds to the first time the set of n birds is split into two subsets 
that will no longer influence each other. Iterating in this fashion produces a tree 
whose leaves are associated with the disjoint Vkj 's. Note that the stabilizers Vi, V2, 
etc, are specific to Bi and their counterparts for B2 might partly overlap with them 
(except for the last one); therefore, the path in the tree toward the leaf labeled 
Vfc^ cannot be inferred directly from the stabilizers. 



Time-Invariant Flocking. Birds are expected to spend most of their time 
flying in fixed flocks. We investigate this case separately. The benefit is to derive 
a convergence time that is exponentially faster than in the general case. In this 
section, Gt = G is time-invariant; for notational convenience, we assume there is 
a single flock, ic, Gt is connected. The flocking is noise-free. We can express the 
stochastic matrix P as In — CL. The corresponding Markov chain is reversible and, 
because of connectivity, irreducible. The diagonal being nonzero, it is aperiodic, 
hence ergodic. The transition matrix P has the simple dominant eigenvalue 1 with 
right and left eigenvectors 1 and 

respectively. Lack of symmetry does not keep P from being diagonalizable, though 
it denies us eigenvector orthogonality. Define 

M = C-VSpc-V^ = C-V2(7„ _ CL)CV2 = _ C^/^LG^^ (8) 

Being symmetric, M can be diagonalized as ^^=1 ^kUkU^, where the u^s are 
orthonormal eigenvectors and the eigenvalues are real. It follows that P can 
be diagonalized as well, with the same eigenvalues. By Perron-Frobenius and 
standard properties of ergodic walks [4,22], 1 = Ai > A2 > • • • > A„ > — 1 and 
= {^y^^, ■ ■ ■ , v^)^- Since uj^u^ = In, the following identity holds for all 
nonnegative s, including s = (for which we must assume that 0° = 1): 

n 

P' = C^/^M'G-^/^ = in^ + J2 \iG^'''ukulG-^'\ (9) 

k=2 

The left and right eigenvectors of P for A^ are given (in column form) by G~^/'^Uk 
and C^I'^Uk and, together, form inverse matrices; in general, neither group forms 
an orthogonal basis. We can bound the second largest eigenvalue by using standard 
algebraic graph theory. We include a proof for completeness. 

Lemma 3.7. If n'^= maxfc>i |Ajt|, then /x < 1 - n~^^^\ 
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Proof. By the 0(logn)-bit encoding of C, each diagonal of P is at least n for 
some constant Q The matrix (1 - - ^n-^/„) is stochastic and all of 

its eigenvalues all lie in [—1, 1]. It follows that A„_i > n"'^^-'^) — 1, for any k > 1. 
Observe now that 1 — A2 is the smallest positive eigenvalue of the normalized 
Laplacian C^^'^LC^^'^. The simplicity of the eigenvalue (by connectivity) im- 
plies that any eigenvector of the normalized Laplacian corresponding to a nonzero 
eigenvalue is normal to C~^/^l; therefore, by Courant-Fischer, 

1 - A2 = minj x^C^/22^C^/2^ : l^C'^/^x = and ||x||2 = l}. 

Write y = C^l'^x and express the system in the equivalent form: 1— A2 = min -y^ Ly, 
subject to (i) l^C~^y = and (ii) ||C~"'^/^y||2 = 1. By using ideas from [4,12], 
we argue that, for some m and M, by (i), < 0, for some m, and from (ii) 
Vm > (tr C"^)""*^/^. Since G is connected, there exists a path M. of length at most 
n joining nodes m and M. Thus, by Cauchy-Schwarz, the solution y of the system 
satisfies: 

i-\^ = y^Ly= ^ {yi-yjf> J2 ~ y^"*"^ - ni ^ \yi-yj\) 



□ 



By for ahi, j,s > 0, {P')ij > 7rj-Ek>i \Xk\'V^j \{uk)i{uk)i\ > 7rj-nOWj£. 
A similar derivation gives us the corresponding upper bound; sojjby Lemma 



3.7 



||^'^-lvr^||F<e-^""°'''+«(i°s"). (10) 
Similarly, for s > rf^ , for a constant cq large enough. 



n 

Ti(P^) = 1 - min V min{(P^),fc , (P^)^^} 



n 

< 1 _ ^(vTfc - nO(i)e— -°''') = nO(i)e— < i 
k=l 



(11) 



Given a vector in M", consider the random variable X formed by picking 
the i-coordinate of x with probability tTj. As claimed in the introduction, the 
variance of X is a quadratic Lyapunov function. This is both well known and 
intuitively obvious since we are sampling from the stationary distribution of an 



® To simplify the notation, constants such as b and c are reused frequently in the text, with 
their values depending on the context. 

The Frobenius norm of a matrix is the Euclidean norm of the vector formed by 

its elements. The property we will use most often is a direct consequence of Cauchy-Schwarz, 
||M«||2 < ||i\f ||f||'u||2, and more generally the submultiplicativity of the norm. 
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ergodic Markov chain and then taking one "mixing" step: the standard deviation 
decreases at a rate given by the Fiedler value. As was observed in [19], because 
the random variable involves only vr and not P, any flock switching that keeps the 
graph connected with the same stationary distribution admits a common quadratic 
Lyapunov function. If = 1, then obviously, varX = 0. We now show that the 
variance decays exponentially fast. 

Lemma 3.8. var(PX) < //^(varX). 

Proof. For any ^, the vector y = (In — Itt'^)^ is such that C~^/'^y is orthogonal to 
''J'l = (\/^' • • ■ ' \/^)"^- Therefore the latter lies in the contractive eigenspace of 
M and 

||M(C-l/2y)||2</.||C-V2y||2; 

hence, by (|8|, 

{Py)'^C-\Py) = {y'^C-^l^){C^I^P^C-^l^){C-^/^PC^/^){C-^'^y) 

= \\MC-^l^\\l<^?\\c-^l^\\l. 

As a result, 

{PyfC-\Py) < fi^y^C-'y. 
Since vr = {ti C^^y^C'-^ 1, 

i=l i 

Because P commutes with /„, — Ivr^, 

var(PX) = (Py)^-^(Py) < ^2(varX), 
and varX is the desired Lyapunov function. □ 



What both (11) and Lemma 3.8 indicate is that convergence for a time- 



invariant flock evolves as e *" "^'^ , whereas in general the best we can do is 

invoke (5) and hope 
exponentially slower. 



invoke (5) and hope for a convergence speed of the form e \ which is 



The Rationality of Limit Configurations. The locations of the birds remain 
rational at all times. Does this mean that in the limit their configurations remain 
so? We prove that this is, indeed, the case. We do not do this simply out of 
curiosity. This will be needed for the analysis of convergence. We cover the case 
of a time- invariant connected network here and postpone the general case for later. 
For t > 0, we define 
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t-1 

Tt = -l7r^t + ^P'. (12) 

s=0 

It is immediate that Tt converges to some matrix T, as t goes to infinity. Indeed, 
by (§, 

s>0 A;>1 

What is perhaps less obvious is why the limit is rational. We begin with a simple 
characterization of T, which we derive by classical arguments about fundamental 
matrices for Markov chains [11]. We also provide a more ad hoc characterization 
(Lemma 3.10) that will make later bound estimations somewhat easier. 

Lemma 3.9. As t ^ oo, Tt converges to T = -Ivr"^ + (I„ - P + Ivr^)"^. 

Proof. Because 1 and vr are respectively right and left eigenvectors of P for the 
eigenvalue 1, for any integer s > 0, 

(P - Ivr^)^ = P'- Ivr^. (13) 

This follows from the identity 

s-l 



(p - )^ = p' + ^{-ly-^ (^\p^{itt 

k=0 ^ ^ 

= p' + {iTT^)Y,{-iy-''('^^ =p'-i 

k=0 ^ ^ 

And so, for t > 1, 

r, + ITT^ = 4 + |^(P^ - Ivr^) = |^(P - Ivr^)^ 



s=l s=0 

Pre-multiplying this identity by the "denominator" that we expect from the geo- 
metric sum, ie. In — P + Ivt"^, we simplify the telescoping sum, using (13) again, 

t-i 



(/„ - P + l7T^){Tt + iTT^) = {In-P + Ivr^) Y,{P - 1 



vr 



s=0 



= /„ - (P - Ivr^)* = /„ - (P* - Ivr^) 

By ([9|, P* converges to Itt-^ as t goes to infinity, so {In — P + l7r-^)(rt + Ivr'^) 
converges to the identity. This implies that, for t large enough, the matrix cannot 
be singular and, hence, neither can In — P + Ivr-^. This allows us to write: 



r + 1^^ ={In-P + In' ) 



T\-l 
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□ 

There is another characterization of T without tt in the inverse matrix. We 
use the notation (Y \ y) to refer to the n-hy-n matrix derived from Y by replacing 
its last column with the vector y. 

Lemma 3.10. r = (/„ - Ivr^ | 0) (!„ - P 1 1 

Proof. Since vr is a left eigenvector of P for 1, l7r-^(/„ — P) = 0; hence, for t > 0, 

/„ - P* = (/„ + P + • • • + P*-i)(/„ - P) = (Tt + l7r^t)(I„ -P)=rt {In - P). 

As t — > oo, P* ^ IvT"^; therefore F (/„ — P) = In — lir'^. Since 1 lies in the kernel 
of Tt, and hence of T, the latter matrix satisfies the relation 

r(I„-P|l) = (/„-l7r^|0). (14) 

The simplicity of P's dominant eigenvalue 1 implies that /„ — P is of rank n — 1. 
Since 1 E ker (/„ — P), the last column In — P is the negative sum of the others; 
so to get the correct rank the first n — 1 columns of I„ — P must be independent. 
Note that the vector 1 is not in the space they span: if, indeed, it were, we would 
have 1 = {In — P)y, for some y G M". Since 7r^(/„ — P) = 0, this would imply that 
1 = = 7r^(/„ — P)y = 0, a contradiction. This is evidence that {In — P | 1) is 
of full rank, which, by (|14[), completes the proof. □ 



The motion equation ([T]) becomes, for t > 1, 

t-i 

x{t) = x{0)+(^P'^lAv{l) (15) 



or, equivalently, by (12), 

x{t) = x{0) + t{{l7r^)<S)Id)v{l) + {Tt®Id)v{l). (16) 

We call m7r[x(t)] = (vr-^® Id)x{t) the mass center of the flock and the vector 
m7r[f (1)] its stationary velocity. The latter is the first spectral (vector) coefficient 
of the velocity. In our lower bound, we will make it the first Fourier coefficient of 
the dynamical system. The mass center drifts in space at constant speed along a 
fixed line in d-space: Indeed, vr^Tt = 0, so by (16), 

m^[x{t)] = m^[x{0)] + tm^[v{l)] 

and 

x{t) = x{0) + t{l Id)ui^[v{l)] + {rt0 Id)v{l) . (17) 

start linear drift damped oscillator 

The oscillations are damped at a rate of e"*" (We use the term not in the 

"harmonic" sense but by reference to the negative eigenvalues that might cause 
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actual oscillations.) Moving the origin to the mass center of the birds, we express 
x{t), relative to this moving frame, as 



x'-{t) = x{t) - {1 Id)in^[x{t)]; 
therefore, by simple tensor manipulation, 

x{t) = x''{t) + {{l7T^)<g) ld)x{0) + t{{l7T^)^ Id)v{l); (18) 

and, by ([l6j), 

x^{t) = x{t) - {{iTT^) Id)x{t) = {{In- Ivr^) ® ld)x{0) + {Tt Id)v{l) 



and, by Lemma 3.9 



Lemma 3.11. IfG is connected, the relative flocking configuration x^ {t) converges 
to the limit 

X' = {{In - iTT^)^ ld)x{0) + (r Id)v{l). 

The mass center of the configuration moves in R'^ at constant speed in a fixed 
direction. 

Lemma 3.12. The elements ofT and the coordinates of the limit configuration x^ 
are CD-rationals over 0{nlogn) and 0{nlogn + pn) bits, respectively. 

Proof. Let Cb denote the 0(n log n)-bit long product of all the denominators in 
the diagonal matrix C. The determinant of {CL \ 1) can be expressed as C^^ times 
the determinant N of an n-by-n matrix with 0(log n)-bit integer elements. By the 
Hadamard bound [30], N is an 0(nlogn)-bit integer. For the same reason, each 
element of adj {CL \ 1) is also the product of C^^ with an 0(n log n)-bit integer; 
therefore, 

(I -p\ 1)-' - (CL I ir^ - ' 

(/„ F\l) -{CL\1) 

is of the form A^~"^ times an 0(n log n)-bit integer matrix (since the two appear- 
ances of C^^ cancel out). The same is true of {In — Ivi"-^ | 0): this is because, 
trivially, vr^ = (0, . . . , 0, l)(/„ - P|l)"^ Therefore, both (/„ - Ivr"^ | 0) and 
{In — P|l)~^ are matrices with CD-rational coordinates over O(nlogn) bits. 
Lemma 3.11[ with the formulation of Lemma 3.10 for T, completes the proof. 
□ 

This implies that x{t) tends toward a+bt, where a, b are rational vectors. Since 
the number of switches and perturbations is finite, this proves the rationality claim 
made in ^ □ 
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Soundness of the Hysteresis Rule. We begin with a proof that hysteresis 
is required to ensure convergence. We build a 4-bird flock in one dimension, 
whose network cannot converge without a hysteresis rule. The construction can 
be trivially lifted to any dimension. The speed of the birds will decay exponentially. 
In real life, of course, the birds would stall. But, as we mentioned earlier, we can 
add a large fixed velocity to all the birds without altering the flocking process. 
Stalling, therefore, is a nonissue, here and throughout this work. These are the 
initial conditions: 

(x{0) = 3^(0,8,21,29); 
\t;(l) = 1(1,-1,1,-1). 

The flocking network alternates between a pair of 2-bird edges and a single 4-bird 
path, whose respective transition matrices are: 



/I 2 0\ 

2 10 

12 

VO 2 1/ 



and 



/I 2 0\ 
1110 
111 

VO 2 1/ 



The beauty of the initial velocity v{l) is that it is a right eigenvector for both flock- 
ing networks 
and, by ^, 



ing networks for the same eigenvalue — |; therefore, for t > 0, v{t) = (—3)^ *'f^(l) 



x{t) = x(0) + ^v{s) = x{0) + f (l - {-lY)v{l 



(19) 



It follows that 



Xi+l{t) - Xi{t) 



16 



Ut-l 
3' 



if i 

if i 



1,3; 
2. 



The distance between the first and second birds stays comfortably between | 
and 2; same with birds B3 and B4. The distance between the middle birds B2 
and B3 oscillates around 1, so the network forever alternates between one and 
two connected components. The pairs (BijBs) and {B2,B4,) form fixed inter-bird 
distances of so the flocks are always simple paths. This proves the necessity 
of hysteresis. As we said earlier, virtually any hysteresis rule would work. Ours is 
chosen out of convenience. 



Lemma 3.13. The hysteresis rule is sound: (i) any two birds within unit distance 
of each other at time t share an edge of Gt; (ii) no two birds at distance greater 
than 1 +^eh are ever adjacent in Gt, where 

7 = (p + log^)"n«("'). 



31 



2/3 1/3 1/3 




even t 



1/3 1/3 2/3 



Figure 9: The flocking network alternates between two configurations forever 
and never converges. 



Corollary 3.14. Under the default settings any two birds within unit dis- 
tance of each other at time t share an edge of Gt; on the other hand, no two birds 
at distance greater than 1 + are ever adjacent in Gt- 



Proof of Lemma 3.13 Part (i) is true by definition. To prove part (ii), assume 
by contradiction that, at time to, two birds Bi and Bj are within unit distance of 
each other but further than 1 apart at time to + 1. Write 

5 = e,(p + log^)Vo"', (20) 

for some large enough constant bo. Assume also that the distance is greater than 
1 + (5 at time ti > to and that, between to and ti, the distance always remains in 
the interval (1, and that the two birds are joined in Gt for all t G [to, ^i]- Such 
conditions would violate soundness, so we show they cannot happen. Obviously, 
they imply that the distance between the two birds never jumps (up or down) by 
Eh or more, since otherwise the hysteresis rule would cease to apply and the edge 
(i, j) would break. This means that Ajj(t) < Eh, for to < t < ti. 



Consider the ti — to relative displacements in the time interval [to, ti]. Together 
they create a displacement in excess of 6. Let k = e*^^" ^ be the number of steps 
witnessing noise. Mark the unit-time intervals within [to,ti] that are associated 
with relative displacements witnessing a perturbation or a network switch: there 
are at most N(n) + k of those, each one associated with a displacement less than 
Eh, SO this leaves us with a total displacement greater than 6 — EhN{n) — EhK. This 
is contributed by no more than iV(n) + k + 1 runs of consecutive unmarked unit- 
time intervals. By the pigeonhole principle, one of these runs contributes a total 
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Figure 10: The distance between two adjacent birds cannot exceed 1 by 
more than 6 before the edge breaks. 



displacement of at least {6 — ehN(n) — e^k) / {N{n) + k + 1). If [sq; ^i] denotes the 
corresponding time interval (to ^ sq < si < ti), then Gt remains invariant for all 
So 1^ t < si and, by Lemma [3^5} 



Sl 

E 

t=SO + l 



5 - ehN{n) - 
N{n) + K+1 



> ,5n-^("')(p + logf 



(21) 



We now show that this displacement is too large for two birds in the same time- 
invariant flock for so long. The edge {i, j) is in the network Gt for all t G [sq, si], so 
the two birds Bi and Bj are in the same flock during that time period. We already 

it follows that, 



observed that T2{A) < V2ti{A). By ^ ll| and Lemmas 
for Sq < t < Sl 



3.3 



3.4 



A^j{t) < \\v,{t) - V,{t)h < T2{P{t - l,So))20(P) < n(P"^"(so))L(*-^»)""^"J20(P) 

^ 2-L(t-so)n-'=oJ+o(p)_ 

(22) 

Technically, the way we phrased it, our derivation assumes that the flock that 
contains the birds Bi and Bj at times sq through si includes all the birds. This is 
only done for notational convenience, however, and the case of smaller flocks can 



be handled in exactly the same way. By (21 22) and the hysteresis rule, 
5„-0(n=')(p + i„g^)i-n< ^ minfe^,2-L(*-o)""°'''J+«(P) 



i=so+l 



t=so+l 



<min{Te^ + 2-L^--°^''J+0(P)}. 

- T>0 ^ ^ 
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Setting T = 2" rp + log f ] leads to 

for some positive constant bi independent of the constant 69 used in the defini- 



tion (20) of 5. Choosing bo large enough thus contradicts our choice of 6. The two 



birds therefore cannot be both joined and apart by more than 1 + (5. □ 



The Geometry of Flocking: The Virtual Bird. Can birds fly in giant loops 
and come back to their point of origin? Are there constraints on their trajectories? 
We show that, after enough time has elapsed, two birds can be newly joined only 
if they fly almost parallel to each other. We also prove that they cannot can 
stray too far from each other if they want to get together again in the future. We 
investigate the geometric structure of flocking and, to help us do so, we introduce 
a useful device, the flight net. 




Figure 11: The flight net is formed by joining together the convex polytopes 
associated with birds' new velocities. 



It is convenient to lift the birds into M°'+^ by adding time as an extra dimen- 
sion]^ x{t) I— > {xi{t), . . . , Xd{t),t)'^; v{t) (f . . . , Vd{t), 1)'^. Since 1 is a right 
eigenvector, this lifting still satisfies the equation of motion. The hysteresis rule 
kicks in at the same time and in the same manner as before; in fact, the lift- 
ing has no bearing whatsoever on the behavior of the birds. The angular offset 
Z{xi{t),Vi{t)), denoted by u}i{t), plays an important role in the analysis]^ It repre- 
sents (roughly) how the trajectory of bird Bi deviates at time t from what it would 

*This is not a projectivization. 

® We use Xi (t) as both a point and a vector, trusting the context to make it obvious which is 
which. 
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have been had the bird reached its current position by flying along a straight line. 
We will show that the angular ofl^set decreases roughly as {logt)/t. This fact has 
many important consequences. 

Instead of following a given bird over time and investigating its trajectory 
locally, we track an imaginary bird that has the ability to switch identities with its 
neighbors: this virtual bird could be Bi for a while and then decide, at any time, to 
become any Bj adjacent to it in the flock. Or, for a rather implausible but helpful 
image, think of a bird passing the baton to any of its neighbors: whoever holds 
the baton is the virtual bird. Its trajectory is highly nondeterministic, as it is 
allowed to follow any path in the flight net. Although in the end we seek answers 
that relate to physical birds, virtuality will prove to be a very powerful analytical 
device. It allows us to answer questions such as: Can a virtual bird fly (almost) 
along a straight line? How far apart can two birds get if they are to meet again 
later? Another key idea is to trace the flight path of virtual birds backwards in 
time. This is how we arc able to translate stochasticity into convexity and thus 
bring in the full power of geometry into the picture. The translation emanates 
from this simple consequence of the velocity equation, v{t) = {P{t— l)'S>Id)v{t— 1): 

Vi{t) G Conv{vj{t-l) I £ Gt-i}. 

By iterating in this fashion, we create the flight net Mi{t) of bird Bi at time t > 0. 
It is a connected collection of line segments (ie, a 1-skeleton): Mi{t) = Mi{t,Kt), 
where Kt is a large integer parameter. Specifically, we set 

Kt=\n'''{p + logt)] (23) 

for a big enough constant 6o- The power of the flight net comes from its ability 
to deliver both kinetic and positional information about the "genealogy" of a 
bird's current state. Let K be an arbitrary positive integer; we define Ni{t,K) 
inductively as follows. The case t = 1 is straightforward: J\fi{t,K) consists of the 
single line segment Xi(0)xi(l). Suppose that t > 1. We say that time s is critical 
if s < K OT if, during the time interval [s — K,s], there is a perturbation or a 
network switch, ie, the velocity of at least one flock is multiplied by hy Im'Sia or 
Gu 7^ Gu+i for some u {s — K <u < s). 

• li t is critical, then Mi{t, K) consists of the segment Xi{t — l)xi{t), together 
with the translates Mj{t — 1,K) + Xi{t — 1) — Xj{t — 1), for all G Gt-i 
and j = i. 

• If t is noncritical, then A/'j(t, K) consists of the segment Xi{t—l)xi{t), together 
with J\fi{t-1,K). 

Every flight net has an antenna sitting on top, which is a line segment extend- 
ing from Xd+i = t — 1 to Xd+i = t in the case of Mi{t, K). In the noncritical case, 
the antenna is connected on top of the previous one, ie, the one for J\fi{t — 1, K). 
Otherwise, we slide the time-(i — 1) flight nets of the adjacent birds so that their 
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antennas join with the bottom vertex of the new antenna: this shift is cahed the 
baton-passing drift. 

Here is the intuition. Flying down the top antenna of the net, the virtual 
bird hits upon another antenna: either there is only one to choose from, in which 
case it is almost collinear (because of noncriticality, the corresponding random 
walk is thoroughly mixed) or else the virtual bird discovers a whole bouquet of 
antennas and picks one of them. Because the old antenna is a convex combination 
of the new ones, the virtual bird can continue its backward flight by choosing 
from a convex cone of directions: this freedom is the true benefit of convexity and, 
hence, stochasticity. This is when the baton is passed: the virtual bird changes its 
correspondence with an actual bird as it chooses one of these directions. Because of 
the translation by Xi{t—1)—Xj{t—1), this change of correspondence is accompanied 
by a shift of length at most one, what we dub the baton-passing drift. 

Viewed from a suitable perspective, the flight net provides a quasi-convex 
structure from which all sorts of metric information can be inferred. Most im- 
portant, it yields the crucial Escape Lemma, which implies that, as time goes by, 
it becomes increasingly easy to predict the velocity of a bird from its location, 
and vice versa. The lemma asserts that the bird flies in a direction that points 
increasingly away from its original position. We begin with a simple observation. 
For any time t > 0, the (d -|- l)-dimensional vector 

W^{t) = ^X^{t) (24) 

represents the constant velocity that bird Bi would need to have if it were to 
leave the origin at time and be at position Xi{t) at time t while flying in a fixed 
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direction. Recall that that the angular offset UJi{t) is Z{xi{t),Vi{t)); we show that 
it cannot deviate too much from the velocity offset \\vi{t) — Wi{t)\\2. 




Lemma 3.15. For any t > 0, 

2-0(P)||^^(i) < ^^(t) < 0{\\v,{t)-w,{t)\\2). 

Proof. Consider the triangle ABC formed by identifying AB with Vi{t) and AC 
with 'Wi{t), and let a,/3, 7 be the angles opposite BC,CA,AB, respectively. Note 
that a = uJi{t) and \\vi{t) — Wi{t)\\2 = \BC\. Assume that /3 < 7; we omit the 
other case, which is virtually identical. By AB and AC have length between 
1 and 2C'(P); therefore, if a / then 2-^(P) < P < Tr/2. The proof follows from 
the law of sines, \BC\~^ sin a = \AC\~^ sin /3. □ 

Lemma 3.16. (Escape Lemma) For any bird Bi, at any time t > 0, 



u^^{t) < -f- n^^"-^(p + log ^)"-^ + - (^2^^W + pn^(^-){p + log 



Corollary 3.17. Under the default settings (flj, at any time t > 1, 



log* .„0{n3) 
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Figure 14: Birds fly increasingly in "escape" mode. 



Unlike the other factors in the upper bound, the presence of log t is an artifact 
of the proof and might not be necessary. Our approach is to exploit the "convexity" 
of single- bird transitions. One should be careful not to treat flocks as macro-birds 
and expect convexity from stationary velocities. In premixing states, all sorts of 
"nonconvex" behavior can happen. For example, consider two flocks in dimension 
1, both with positive stationary velocities. Say the one on the left has higher 
speed and catches up with the one on the right to merge into one happy flock. It 
could be the case that the stationary velocity of the combined flock is negative, ie, 
the joint flock moves left even though each one of the two flocks was collectively 
moving right prior to merging. Of course, this a premixing aberration that we 
would not expect in the long run. 



Proof of Lemma 3.16. From the initial conditions, we derive a trivial upper bound 
of 2'^(P) for constant t, so we may assume that t is large enough and 0Ji{t) > 0. 
The line passing through Xi{t) in the direction of Vi{t) intersects the hyperplane 
X^+i = in a point p at distance from the origin, ||p||2 = ^{tcui^t)). Recall that 
the bird Bi started its journey at distance 2*^'-^) from the origin. If it had flown 
in a straight line, then we would have p = Xj(0), hence uJi{t) = j2'^^'^\ and we 
would be done. Chances are the bird did not fly straight, however. If not, then 
we exhibit a virtual bird that (almost) does, at least in the sense that it does not 
get much closer to the origin at time that a straightline flight would. The idea is 
to use the flight net to follow the trajectory of a virtual bird that closely mimics 
a straight flight from p to Xi{t). 

Some words of intuition. If all times were critical and no perturbation ever 
took place, then it would be easy to prove by backward induction that, for all 
< s < t, the segment pxi{t) intersects each hyperplane X^+i = s in a point 
that lies within the convex hull of Mi{t) n {^d+i = s}. This would imply that p 
lies in the convex hull of the birds at time 0, which again would give us the same 
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lower bound on uJi{t) as above (modulo the baton-passing drift). In fact, it would 
be possible to trace a shadow path from Xi{t) down the flight net that leads to a 
virtual bird at time that is even further away from the origin than p. (We use 
here a fundamental property of convexity, that no point can be further to a point 
in a convex polytope than to all of its vertices.) Unfortunately, this convexity 
argument breaks down because of the net's jagged paths over noncritical time 
periods. The jaggedness is so small, however, that it provides us enough "quasi- 
convexity" to rescue the argument. 




Figure 15: The shadow path attempts to follow the segment pxi{t) closely. 



First we describe the shadow path; then we show why it works. Instead of 
handling convexity in M'^"^^, we will find it easier to do this in projection. By 



Lemma 3.15, there exists a coordinate axis, say Xi, such that 

<LOi{t) = 0{vi{t)i-wi{t)i). (25) 

Note that we may have to reverse the sign of Vi{t)i — Wi{t)i, but this is immaterial. 
The shadow path xj{t),xj{t — 1), ... , xj{0) describes the flight of the virtual bird 
backwards in time. The first two vertices are xj(t) = Xi{t) and xl{t — 1) = 
Xi{t — 1). This means the virtual bird flies down the topmost edge of A/'j(t), ie, in 
the negative X^+i direction. Next, the following rule applies for s = f — 1, . . . , 2: 

• If s is noncritical, Mi{t) has a single edge ys~2ys-i, with (2/5-2)^+1 = s — 2. 
The virtual bird flies down ys-2ys-i and we set xj(s — 2) = ys-2 accordingly. 

• If s is critical, A/i(t) has one or several edges y^_2ys-i, with {y^_2)d+i = s—2. 
The virtual bird follows the edge with maximum Xi-extant, ie, the one that 
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maximizes {ys-i)i — {ys-2)i- (Recall that, although neither ys-i nor yj_2 
might be the position of any actual bird, their difference y^-i — 2/^-2 ^he 
velocity vector Vj{s — 1) of some Bj.) We set xj{s — 2) = yg_2- 





- Xi{t) 


p ^ o 



Figure 16: Following the red shadow path. 



The virtual bird thus moves down the flight net back in time until it lands at 
Xd+i = 0. The resulting collection of t + 1 vertices forms the shadow path of the 
virtual bird at time t. Naturally, we define the velocity of B^ at time s > 
as vj{s) = xj{s) — xj{s — 1). Note that vj{t) = Vi{t). To prove that the shadow 
path does not stray far from the straightline flight from Xi(t) to p, we focus on 
the difference 

Vs = vJ{s)i-Wi{t)i, (26) 

for s > 1. If we could show that Vs is always nonnegative then, measured in 
projection along the Xi axis, the virtual bird would fly back in time even further 
away from the origin that it would if it flew straight from Xi{t) to the hyperplane 
X[i^i = in the direction of —Vi{t). Except for the fact that a virtual bird at 
time may not share the location of any actual bird (an issue we will address 
later), this would entirely rescue our initial argument. We cannot quite ensure the 
nonnegativity of V^, but we come close enough to serve our purposes. 

Consider an interval [r, s] consisting entirely of noncritical times (hence r > 
Kt). The flock that contains the virtual bird B^ is invariant between times r — Kt 
and s and undergoes no perturbation during that period; furthermore, B^ has the 
same incarnation as some fixed Bj during the time period [r — 1, s]. If x(i) denotes 
the n-dimensional vector with all coordinates equal to 0, except for x{j)j = 1) then, 
for r — 1 < ?x < s, 

= iixUfP'^^'''''') ® IdHr - Kt). 
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We abuse notation and restrict P and v(r — Kf) to the flock of Bj and not to all 
of Gr-Kt = --- = Gs. By (U [10|, we find that 

\vj{u)i - {m^vir - Kt)])i\ < ||(((x(j)^""''+''') ® - (^^ ® Id)Hr - Kt)\\2 

< Vd\\P''-''+^' - iTr'^WpMr - Kt)\\2 

^ ^-iu-r+Kt)n-°W+o{p+logn) 

We conclude that 

\Vr-i - Vs\ = \vj{r - l)i - <(s)i| = \vj{r - l)i - Vj{s)i\ 

< \vj{r - 1)1 - (m^[u(r - Kt)])i\ + \vj{s)i - (m^[t;(r - Kt)])i\; 

hence, using p > n'^, 

|l^r-i-l^s| <e-^'""°'''+«(P). (27) 

As usual, K = e*^^"^-* denotes the number of steps witnessing noise. Suppose 
now that s > 1 is critical. If no perturbation occurs at time s — 1, then vj{s) 
is a convex combination of the vectors of the fiight net joining X^+i = s — 2 to 
Xrf+i = s — 1. By construction, it follows that 

vJ{s-l)^>vJis)^. 

If the vector is perturbed by then 

vj{s - 1)1 > vJis)i - Ci > vj{s)i - 

where St = e^^^^^ (the perturbation bound). In both cases, therefore, Vs-i > 
Vs — Ss-i- Let (t be the number of critical times. By (27), for all 1 < s < 

u=s 

Summing over all s, 

j2Vs> m - (t - i)e:e-^*-~''*''+«(p) -Y,s6s. 

s=l s=l 

Since, by assumption, 5s = at all but k places, 

t-i 



s=l 



By^,Vt = vj{t)i - Wi{t)i = n{oj,{t)); therefore, 

t 



u;,it) = 0{Vt) = ^l^yJ +(2:e-^'"-'''^^+^(P) + i^^eO(«^). (28) 
til t 

s=l 
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By (24 



26), 



Y,Vs = Y,{xUs)l - XUS - l)l - Wi{t)l} = Xj{t)l - Xl(0)l - tWi{t)l 
s=l s=l 

= xKi)i-x,(t)i-x[(0)i = -xI(0)i. 

Since xj{0)i is the position of a virtual bird at time 0, it is tempting to infer that 
it is also the position of some actual bird at that time; hence |x^(0)i| = 2^^'^\ 
This is not quite true because adding together the velocity vectors ignores the 
baton-passing drift, ie, the displacements caused by switching birds. At critical 
times, the virtual bird gets assigned a new physical bird that is adjacent to its 
currently assigned feathered creature. Recall how the net Mj{t — 1, K) is translated 
hy Xi{t — 1) — Xj{t — 1). Since («, j) E Gf_i, this causes a displacement of at most 1. 
Note that unlike the velocity perturbations, whose effects are multiplied by time, 
the drift is additive. This highlights the role of the flight net as both a kinetic and 
a positional object. Summing them up, we find that |x^(0)i| < £ + 2'='(P); hence 

t 

< e: + 2^(p). (29) 

s=l 

Recall that a time is critical if there exists either a perturbation or a network 



switch in the past Kt steps. Recall (23) that Kt = [re °(p + logt)] for a large 



enough constant 69. By Lemma 3.5 this bounds the number of critical times by 



<t<Kt{N{n)+K) < (p + logt)nO(" )(p + bg 



and the lemma follows from (28, 29). □ 



We mention a few other corollaries of Lemma 3.16 that rely on the model's as- 
sumptions. Again, recall that the sole purpose of these assumptions is to alleviate 
the notation and help one's intuition. 

Corollary 3.18. Under the default settings at any time t > 1, a bird turns 
by an angle Z(vi{t),Vi{t + 1)) that is at most 



logt 



^0(„3)_ 



Proof. By ^ and 5t = i^Si gO(n-^)^ j^o bird can take a step longer than 2*^(f), 
therefore the angle between the vectors Xi{t) and Xi{t + 1) is at most j2'^'^P\ As 
a result, 

Z{vi{t),Vi{t + 1)) < Z{vi{t),Xi{t)) + Z{xi{t),Xi{t + 1)) + Z{xi{t + l),Vi{t + 1)) 
= ojiit) + Z{x^{t),x^{t + 1)) + uJi{t + 1), 
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and the proof follows from Corollary 3.17 The property we are using here is 



the triangle inequality for angles: equivalently, the fact that, among the 3 angles 
around a vertex of a tetrahedron in M^, none can exceed the sum of the others. 
Even though the birds live in higher dimension, our implicit argument involves 
only 3 points at a time and therefore belongs in M^. □ 



Corollary 3.19. Under the default settings if two birds are adjacent in the 
flocking network at time t > 1, their distance prior to t always remains within 
nO("')logt. 

Proof. For reasons discussed above, any two birds are within distance 2*^^^) after 
a constant number of steps, so we may assume that t is large enough. Consider 
the time s that maximizes the distance Rg, for all s G [0, t — 1], between the points 
Xi{s) and p = [s/t)xi{t) in the hyperplane Xd+i = s. For the same reason, we 
may assume that s > 1. By Corollaries |3.17 3.18 



A(xi{s)Ms + l))<<s) + A[v^{s)Ms + l)) < (30) 

s 

Set up an orthogonal coordinate system in the plane spanned by 0,p,Xi{s): O is 
the origin; the X-axis lies in the hyperplane X^+i = and runs in the direction 
from p to Xi{s); the y-axis is normal to OX in the 0,p,Xi{s) plane. By ([3|, the 
F-coordinate py of p satisfies 

s<pY < s2'^(P). 

Let Y = X tan a and Y = X tan /? be the two lines through the origin passing 
through Xi{t) and Xi{s), respectively. Setting Y = py we find that px = Py/ tana 
and Xi{s)x = Px/ tan/3; therefore 



1 1 



tan /3 tan a 



^20(p) < sin(a-/3) ^^^q^^^ 
(sin a) (sin (3) 



By construction, the velocity Vi{s + 1) cannot take the bird Bi outside the elliptical 
cylinder that is centered at the line (O, Xi{t)) with the point Xj(s) on its boundary 
and that intersects X^+i = in a disk of radius Rg = \pxi{s)\. It follows that the 
normal projection w of Vi{s + 1) on the {X, y)-plane forms an angle 7 with Xi{s) 
at least equal to the angle between the two lines Y = X tan a and Y = X tan /3, 



which is a — /3. By (30), therefore, 

a- P <j < Z{x,{s),Vi{s + l) < ^n«("'). 

s 

Birds are at most away from the origin at time and, by take no step 

larger than that bound. It follows that both a and /? are at least 2~*^^P\ therefore 

iJ, <20(P) nO("')logt. 
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If two birds Bi and Bj share an edge in a flock at time t, then ||xj(t) — Xj(t) \\2 < 1; 
so, by the triangle inequality, at any time 1 < s < i, 

- xj{s)\\2 < 20(P) nO("') logt + J ||x,(t) - x,{t)\\2, 

which, by the default settings Q, proves the lemma. □ 




Figure 17: Two birds can't stray too far from each other if they're ever to meet again. 



Suppose that birds Bi and Bj are distance at most D at time t > 0. (No 
assumption is made whether they belong to the same flock or whether ^ holds.) 



By (24) and Lemmas 3.4 3.15 3.16 



Aij(t) <\\Vi{t) - Vjy 

< \\Vi{t)-]xi{t)\\2 + \\vj{t)-]i 

<(a;i(t)+a;,(t))20(P) + f . 



+ -tW^iit) - Xj{t)\\2 



(31) 



Corollary 3.20. Under the default settings at any time t > 1, the difference 
in stationary velocities between two distinct flocks joining into a common one at 
time t + 1 has Euclidean norm at most 7^0(" )_ 



Proof. The stationary velocity of a flock is a convex combination of its con- 
stituents' individual velocities, so the difference in stationary velocities cannot 
exceed, length- wise, the maximum difference between individual ones. By 1^ and 
the connectivity of flocks, the distance at time t between any two birds in the 
common flock at time t + 1 cannot exceed D = n + 2'^(P\ The lemma follows 



from (31) and Corollary 3.17 



□ 
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We define the fragmentation breakpoint tf as 



i; = i^cJn^/"'(p + logi^r, (32) 



where Cf is a large enough constant. Setting D = 1 in (31), we find that, by 
hysteresis and the Escape Lemma, the edges of Gt can break only if t < tf. Past 
the fragmentation breakpoint, flocks can only merge. 

Lemma 3.21. At any time t > tf, the flocking network Gt may gain new edges 
but never lose any. 



The Escape Lemma tells us that, after the fragmentation breakpoint, birds fly 
almost in a straightline and both their positions and velocities can be predicted 
with low relative error. From a physical standpoint, they have already converged. 
The flocking network may still change, however. It may keep doing so for an 
unbelievably long time. This is what we show in the next section. Note that, 
under the default settings of (B, the fragmentation breakpoint tf is n'-^^^^\ 



3.3 Iterated Exponential Growth 

To pinpoint the exact convergence time requires some effort, so it is helpful to 
break down the task into two parts. We begin with a proof that the flocking 
network reaches steady state after a number of steps equal to a tower-of-twos of 
linear height. This allows us to present some of the main ideas and prepare the 



grounds for the more difficult proof of the logarithmic height in ^3.4 The main 
tools we use in this section are the rationality of limit configurations and root 
separation bounds from elimination theory. Our investigation focuses on the post- 
fragmentation phase, ie, t > tf. We do not yet adopt the assumptions of ([2|; in 



particular, we use the definition of tf given in (32). 



Lemma 3.22. Consider two birds adjacent at time t but not t — 1. Assume that 
the flocks that contain them remain invariant and noise-free during the period 
[ti,t — 1], where tf < ti < t — 1. If, at time t — 1, the birds are in different flocks 
with distinct stationary velocities, then t < nPi^^^) ■ otherwise, t < ti2"''^'^'. 



Proof. Assume that the flocking network Gt stays invariant during the period 
[ti,t — 1]. Consider two birds Bi and Bj that are adjacent in Gt but not during 
[ti,t — 1]. The two birds may or may not be in the same flock at time t — 1. Let 
the flock for Bi (resp. Bj) consist of m (resp. m') birds: m = m' if the birds are 
in the same flock, else m + m! <n. By abuse of notation, we use the terminology 
of ([9]), ie, P, vr, C, Uk, Xk, as well as v{t), to refer to the flock of m birds, and 
we add primes to distinguish it from the flock of Bj. We wish to place an upper 
bound on t — ti. Let x(^) denote the m-dimensional vector with all coordinates 
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equal to 0, except for = 1- By ([9 15), for t > ti, 



t-ti-i 



s=0 



1 - ^1"*' ^ 



fc=2 



where 



f y = (vr^ (ti + 1) = m^[t>(ti + 1)] ; 
1 ^>fc = ((x(i)"^CV2^,n^C-V2) ^ + 1). 

Note that, by (|9][T2]), 

m ^ t—1 m 



(33) 



A;=2 



s=0 k=2 

t-1 



r) C3 + 1) ; 



therefore, 



= x,{h) + iixiifr) IdHh + l) + {t- h)y - E Y 



fc=2 



Adding primes to distinguish between the flocks of Bi and Bj (if need be), we find 
that 



mo 



i{t) - Xj{t) = A + B{t-ti)-Y, ^fe 4 



(34) 



k=l 



where 



(i) A = Xi{h) - xj{h) + {{xiifr) h)v{ti + 1) - {{x'ijfT') h)v'{ti + 1): 
By Lemma 3.2 the vectors v{ti + 1), v'{ti + 1), Xj(ti), and Xj{ti) have CD- 
rational coordinates over 0(tinlogn + pn) bits, which is also O(finlogn), 
since, by (32), ti > tf > p. In view of Lemma 3.12 this implies that the 
same is true of the vector A. 

(ii) B = y—y': The stationary distribution vr = (tr C~^)^^C^^ 1 is a CD-rational 
vector over 0(n log n) bits. Together with Lemma 3.2, this implies that B 
has CD-rational coordinates over O(tinlogn) bits; hence either S = or 
||5||2 >n-0(*i"). 

(iii) /«!>•••> fimo- Each fik is an eigenvalue A/ or A; (/,/' > 1) and \fik\ < 1- 
Their number niQ is either m — 1 (if the two birds Bi and Bj belong to the 
same flock) or m -|- m' — 2, otherwise. 
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(iv) Each is a d-dimensional vector of the form — Xi) or /( I — A^). 

Since the eigenvalues are bounded away from 1 by n^'^^^) (Lemma 3.7), it 
follows from ([s]), the submultiplicativity of the Frobenius norm, and p > n' 
that ||^'fc||2 = 2<^(P). In the same vein, we note for future reference that 

mo 

II Yl f^V' h < e-(*-*^)""°'''+0(P) = 20(P). (35) 

k=l 

We distinguish among three cases: 

Case I. B ^ 0: The two flocks must be distinct, for having the two birds in 
the same flock would imply that vr = vr' and v{ti + 1) = v'{ti + 1); hence y = y'. 
By (i, ii), ||^||2 < nOitin) \\B\\2 > n^O{tin) _ j£ birds are to be joined 



in Gt, then BiSTt{Bi,Bj) = \\xi{t) - xj(t)\\2 < 1. By (32), ti > tj > p; hence 



20(p) ^ „0(tin)_ follows from (33 35) that t - h < Note that, for 



the lower bound of -BII2 to be tight, the flock would have to be able 



to generate numbers almost as small as Lemma 3.2 allows. For this to happen, 
energy must shift toward the dominant eigenvalue. This spectral shift occurs only 
in a specific context, which we examine in detail in the next section. 

Case II. B = and ||A||2 / 1: By (i), ||^||2 is bounded away from 1 by n"'^^*^"'). 
It follows from (|34l pSj) and the triangle inequality that 



I \\Xi{t) - Xj{t)\\2 - 1 I > IPII2 - 1| - I \\xi{t) - Xj{t)\\2 - \\A\\2 I 
>^-0(tin)_||^^^^^t-*i||^ 

Since ti > p, this implies that, for a large enough constant 60 1 the distance between 
the two birds remains bounded away from 1 by n"'-^^*^") at any time s > tin^". 
Not only that, but the sign of DiSTs{Bi,Bj) — 1 can no longer change after time 
tivP". Indeed, for any s > tin!"^, the distance between times s — 1 and s varies by 



an increment of Ajj(s), where, by (35), 



Aij{s) = I ||Xi(s) - Xj{s)\\2 - \\xi{s - 1) - Xj{s - 1)||2| 

<ll Ek'^kf^l'-'^h + W E.^fc/^r*Ml2 

< g-(s-ti)n-0(i)+0(p) < g-tm". 

With n assumed large enough, this ensures that, past time tin'''', the distance can 
never cross the value 1. Thus, if the two birds have not gotten within distance 1 
of each other by time tin*°, they never will — at least while their respective flocks 
remain invariant. We conclude that t < tin'^^^\ 

Case III. B = and ||A||2 = 1: The distance between the two birds tends 
toward 1. The concern is that the two birds might stay safely away from each 
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other for a long period of time and then suddenly decide to get close enough to 
share an edge. The rationality of the limit configuration is insufficient to prevent 
this. Only a local analysis of the convergence can show that a long-delayed pairing 
is impossible. We wish to prove that, if DiSTs{I3i, Bj) is to fall below 1 for s > ti, 
this must happen relatively soon. Recall that, by (|34[), 



k=l 

where A is a unit vector. We investigate the behavior of the birds' distance locally 
around 1. 

\\xi{s) - xj{s)\\l = l-2Y,A^'^kf^t'' + J2^l^k' {f^kf^k'V'. 

k k,k' 

Let 1 > pi > • • • > Pat > be the distinct nonzero values among {\fJ-k\, IPfeMfc'l} 
(N < n^). These absolute values may appear with a plus or minus sign (or both) 
in the expression above, so we rewrite it as 

N 

\\^^^s)-x,{s)\\l-l = Y,'^kPl-'\ (36) 

fe=i 

where each 

Tfc = T+ + (-l)^T^ 

corresponds to a distinct pk- We distinguish between odd and even values of s so 
as to keep each time-invariant. We assume that s is even and skip the odd case 
because it is similar. Of course, we may also assume that each = -|- 
is nonzero. We know that Ylk '^k Pk~*^ tends to as s goes to infinity, but the 
issue is how so. To answer this question, we need bounds on eigenvalue gaps 
and on |Tfc|. Tighter results could be obtained from current spectral technology, 
but they would not make any difference for our purposes, so we settle for simple, 
conservative estimates. 



Lemma 3.23. For all k > 1 and k > 1, respectively, 

Pk<{l- 2-"°''Vi ^nd 2-*^2"°'^' < ITfcl = 20(P). 



Proof. We begin with the eigenvalue gapP^ For this we use a conservative version 



of Canny's root separation bound [2,30]: given a system of m integer-coefficient 
polynomials in m variables with a finite set of complex solution points, any nonzero 
coordinate has modulus at least 

2-^^°""', (37) 



^" For the purpose of this lemma, we again abuse notation by letting P and n pertain to the 
flock of either one of the two birds. This will help the reader keep track of the notation while, as 
a bonus, releasing m as a variable. 
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where D — 1 is the maximum degree of any polynomial and i is the number of 
bits needed to represent any coefficient. Any difference pk — pi can expressed by 
a quadratic polynomial, z = ziZ2 — Z3Z4, where each Zi is either 1 or the root of 
the characteristic polynomial det (P — A/„). The elements of P are CD-rationals 
over 0(n log n) bits, so by the Hadamard bound [30] the roots of det (P — XIn) 
are also those a polynomial of degree n with integer coefficients over O(n^logn) 
bits; therefore, m<5; D = n + 1; and £ = rfi^^\ This proves that the minimum 



gap between two p^'s is 2 



,0(1) 



. Since pi < 1, we find that, for A; > 1, 



/5fc<(l-2-"°*'')/9i, 



which proves the first part of the lemma. 

By (iv), II ^'i II 2 = 2*^^^); therefore, by Cauchy-Schwarz and the inequalities 
Pfc < 1 and p > n^, the same bound of 2*^*^^^ applies to any |Tfc|, which proves the 
second upper bound of the lemma. We now prove that |Tfc| cannot be too small. 
Recall that it is the sum/difference of inner products between vectors in {A, '^h}- 
We know from (iv) that '^h is of the form — A;) or — <I>^/(1 — A^). We assume 

the former without loss of generality. By (|9 12), 



r=2 s>0 



In view of (iv) and (33), it then follows that 



1-A, 



+ 1) 



n 

E E {{x{^fKC'/'urU^rC-'/'C^'\luJC-''') ® h}v{t, + 1) 



r=2 s>0 



{{xiif^c^'^uiujc-^'^) ® idHh + 1) = ((x(O'^r) id)w, 



where W = {{C^/'^uiuJ C'^/'^) ® Id)v{ti + 1). By Lemma 3.2 v{ti + 1) is a vector 



with CD-rational coordinates over O(iinlogn) bits; remember that ti > p. By 



Lemma 3.12 the elements of T are CD-rationals encoded over 0{n log n) bits. Any 
coordinate of '^h can thus be written as a sum of at most terms of the form 
RiaiUiZi, where: 

• All the i?j's are products of the form T^^v^^^ti + 1), hence CD-rationals over 
0{tinlogn) bits; 
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• ai is the square root of a rational c^/c^ over 0(log?7-) bits; 

• yi,Zi are two coordinates of ui. Recall that, by ([s]), is a unit eigenvector 

of C-l/2pC'l/2. 

By (i), ^ is a vector with CD-rational coordinates over O(tinlogn) bits. It follows 
that Tfc is a sum of n^^^^ terms of the form Sijii/iZiy'-z'f 

• All the Sj's are CD-rationals over O(tinlogn) bits; 

• 7i is the square root of an 0(logn)-bit rational, ie, a number of the form 
\/(c^/c*)(c*/c*); 

• Ui, Zi,y'-, z'^ are coordinates of the eigenvectors (or 1, to account for A^'i^h). 

It is straightforward (but tedious) to set up an integer-coefficient algebraic system 
over m = rf"^^'^ variables that includes as one of the variables. The number of 
equations is also m and the maximum degree is n. All the coefficients are integers 
over 0{tin\ogn + rP'^^^) bits. Rather than setting up the system in full, let us 
briefly review what it needs to contain: 

1. Tfc is a sum of rP^'^^ quintic monomials SijuiiZiy'^Zi; where the S'i's are CD- 
rationals over O(tinlogn) bits. 

2. Each 7j is of the form y/ajb, where a, b are 0(log n)-bit integers. We express 
it by the equation 67? = a. (This yields two roots, but any solution set is 
fine as long as it is finite and contains those we want.) 

3. The yi,Zi,y[,z[ are coordinates of the eigenvectors ui of C^^/'^PC'^^'^ . We 
specify them by first defining the eigenvalues Ai, . . . , A„ and 

det (P - Xiln) = 0; 

^~i/2p(^i/2^. ^ (1 < f < J < n) 

\\ui\\\ = 1, and ujuj = 0. 

The issue of multiplicity arises. If the kernels of the various P — ^iln are 
not of dimension 1, we must throw in cutting planes to bring down their 
sizes. We add in coordinate hyperplanes to the mix until we get the right 
dimension. We then repeat this process for each multiple eigenvalue in turn. 
(Of course, we do all this prior to forming the vectors ^h-) We rewrite each 
eigensystem as Pvi = XiVi, where Vi = C^^'^Ui, and again we square the 
latter set of equations to bring them in polynomial form. 

Once we reduce all the rational coefficients to integers, we can use the separation 
bound (37), for m = n'-^^^\ D = n + 1, and £ = 0{tinlogn + n'^^^^), which is 
O(tinlogn). This gives us a bound on the modulus of any nonzero coordinate of 
the solution set; hence on 1X^1. □ 
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By (36 ), it follows from the lemma that ||xi(s) — Xj(s) ^ — 1 = Ti ^ (1 + (^) 



where 



^,-„0(l) 0(1) 

|C|<e"('"*i)2 +ti2 =o(l), 



for s > ti2"'''^ , with 6i being a large enough constant. The same argument for 
odd values of s shows that, after ti2" \ either — Xj(s)||2 stays on one side 

of 1 forever or it constantly alternates (at odd and even times). Since the birds 
are joined in Gt but not in Gg {ti < s < t), it must be the case that t < ti2"°'^\ 
This concludes Case III. 

Putting all three results together, we find that the bound from Case I is the 
most severe, t < rf"^^^'^\ while Case II is the most lenient. When the two birds are 
in the same flock at time t — 1, however, the bound from Case III takes precedence. 
□ 



Lemmas |3.21| and |3.22 show that all network switches take place within the 



first too = 2 II 0(n) steps. Perturbations occur within rf^^^ steps of a switch 



and do not affect Lemma 3.2, The previous argument thus still applies and shows 



that the same upper bound also holds in the noisy model. After time too the 



flocking network remains invariant. By virtue of (18), the limit trajectory of the 
birds within a given flock is expressed as 

x{t) = x' + {{l^"^)® Id)x{too) + {t- t^){{l^'^)® Id)v{too + 1), 

where the stationary distribution vr refers to the bird's flock (and therefore should 
be annotated accordingly). 

3.4 Tower of Logarithmic Height 

We prove that the tower-of-twos has height less than 41ogn. To simplify the 
notation (a decision whose wisdom the reader will soon come to appreciate), we 
now adopt the assumptions of ([2|. As we discussed earlier, this means setting 
the fragmentation breakpoint tj = n-^"" for some large enough constant /q. The 
improvement rests on a more careful analysis of the merges subsequent to the 



fragmentation breakpoint tf. Note that in the proof of Lemma 3.22 the bottleneck 
lies in Case I: specifically, in the lower bound on \\B\\2 and the upper bound on 
||j4||2. The latter can be improved easily by invoking the Escape Lemma. To get 



around ||i?||2 requires more work. Recall from (17) that the position vector of one 
flock is given by 

x{t) = a + bt + {Tt® Id)v, 

where the matrix Tt describes a damped oscillator. The stationary velocity h is 
formed by the first spectral coordinates, one for each dimension, associated with 
the eigenvalue 1. The oscillator involves only the spectral coordinates correspond- 
ing to the subdominant eigenvalues (|Afc| < 1). 
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The Combinatorics of the Spectral Shift. The reason flocks take longer to 
merge into larger flocks is that they fly in formations increasingly parallel to one 
another. The term bt grows linearly in t, so an iterated exponential growth can 
only come from the oscillator. Of course, the angle between the flight directions of 
two flocks is given by the stationary velocities. Therefore, for the angles to inherit 
an exponentially decaying growth, it is necessary to transfer the fast-decaying 
energy of the oscillators to the stationary velocities themselves. In other words, the 
collision between two flocks must witness a spectral shift from the "subdominant" 
eigenspace to the stationary velocities. Small angles are achieved by getting two 
stationary velocities to be very close to each other. Indeed, the spectral shift does 
not cause a decay of the velocities themselves but of pairwise differences. Recall 
that flocking is invariant under translation in velocity space; so any interesting 
phenomenon can be captured only by differences. 

Let b be the stationary velocity of the new flock formed by two flocks joining 
together after flying on their own during t steps. Let b' be the stationary velocity 
resulting from two other flocks flying in similar conditions. The spectral shift will 
ensure that the difference b — b' has Euclidean norm e~^"' , ie, exponentially 
small in the flight time. One should think of it as an energy transfer from the 
subdominant eigenspaces to the stationary velocities. The challenge is to show 
that this transfer can occur only under certain conditions that greatly restrict its 
occurence. This requires a combinatorial investigation of the spectral shift. 




Figure 18: Without spectral shift, the difference between stationary veloci- 
ties becomes null and the two flocks never meet. The spectral shift resupplies 
the stationary velocities with the fast-decaying energy located in the subdom- 
inant part of the spectrum. This causes a slight inflection of the trajectory 
(black lines). 
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We model the sequence of post-fragmentation breakpoint merges by a forest T: 
each internal node a corresponds to a flock Fa of birds formed at time ta > tf. 
If a is a leaf of then its formation time ta is at most tf. A node with at least 
two children is called branching. A nonbranching node represents the addition of 
edges within the same flock. Our analysis will focus on branching nodes with no 
more than two children. In general, of course, this number can be arbitrarily high, 
as several flocks may come together to merge simultaneously. We will see later 
how to break down multiple aggregation of this form into pairwise merges. 

Let L{t) denote the minimum value of n^, the number of birds in F^, over all 
branching nodes a and all initial conditions subject to (|2]), such that ta > t. Our 
previous upper bound shows that L{t) = r2(log* t). We strengthen this: 

Lemma 3.24. L{t) > (i.i938)i°g' *-0(iogiogn)^ ^^/^g^g ^ 1.1938 is the unique 
real root of — — 1. 

This implies that the last merge must t ake p lace before time t such that L{t) < 

multiplying this quantity by 2'^'^'^' 
owing the last merge. As observed 



3.22 



n; hence t < 2 || (3.912 logn). By Lemma i 
suffices to account for the network switches foil 
earlier, noise has no effect on this bound. This proves the upper bound claimed 
in ^ □ 



The Intuition. Think of the group of birds as a big-number engine. How many 
bits can n birds encode in their velocities at the last network switch? The previous 
argument shows that this number cannot exceed a tower-of-twos of linear height. 
We show that in fact the height is only logarithmic. What keeps this number 
down is the presence of residues. We begin with a toy problem that has no direct 
connection to bird flocking but illustrates the notion of residue. Consider an n- 
leaf binary tree whose nodes are associated with polynomials in M[X]. Each leaf 
is assigned its own polynomial of degree 1. The polynomial pv at an internal node 
V is defined recursively by combining those at the children u, w: 

Pv=Pu® Pw=Pu+Pw + {Pu-Pw)X , 

where h(p) is if j; = not a root of p and h{p) is its multiplicity otherwise; in 
other words, it is the lowest degree among the (nonzero) monomials of p. How 
big can the degree of Proot be? It is immediate to achieve a degree that is a 
tower-of-twos of logarithmic height. Take a complete binary tree and assign the 
polynomial {—lY^^^x to a leaf v, where l{v) is the number of left turns from the 
root to the leaf. We verify by induction that the polynomials at level k are of the 
form zbcfcx'^'=, where di = 1 and, for k > 1, 

Ckx'^" = ±2cfc_ix'^*-i+2''=-\ 

This shows that = dfc-i + 2'^'=-i; hence the stated tower-of-twos of logarithmic 
height. Couldn't we increase the height by choosing a nonbalanced tree? The 
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answer is no, but not for the obvious reason. The "obvious reason" would be that 
to go from a node u of degree d to a parent of degree 2'^ requires not just one but 
two children u, v of degree d. Nice idea. Unfortunately, it is not true: 



(38) 



Note, however, that if we try to repeat this trick we get 



= x''(l + x^'' 



which increases the degree by only a constant factor. The reason for this is that 
during the exponential jump in (38) the polynomial inherited a residue, ie, the 
"low-degree" monomial x'^, which will hamper future growth until it is removed. 
But to do so requires another "big-degree" child. This residue-clearing task is 
what keeps the tower's height logarithmic. We prove this below. 

Theorem 3.25. A tree of n nodes can produce only polynomials of degree at most 
2TTO(logn). 

Proof. Let L{d) be the minimum number of leaves needed to produce at the root 
a polynomial of degree at least d. We prove that 



L{d) > 2^(i°s*'^), 



(39) 



from which the theorem follows. Let v be the root of the smallest n-leaf tree that 
achieves dy > d, where d^ denotes the degree of Let u, w be the children of v, 
with du > dw, and let y,z be the children of u with dy > d^. We assume that d is 
large enough, so all these nodes exist. Note that 



dv < du + 2 



and 



du<dy + 2° 



(40) 



Assume that 



d^^du, < logloglog^^^,; 
dy < < du < \f(U- 



(41) 



In view of ( 40 ) , this shows that d^ > d^, and dy > d^. This first inequality implies 
that du + = d; therefore, by (41 ), h{pu - Pw) > 2 log dy. In other words. 



n — n -L T^0°Sdv)/2] 



(42) 



for some polynomial qu 7^ 0. Repeating the same line of reasoning at node u, we 
derive the identity dy + 2^^y~P''^ = du from the strict inequality dy > dz- It follows 
from (|40l 1411) that 



dz < log log log < I log du < h{py - Pz) < logd„ < 5 log 4. 
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This implies two things: first, by dz < h{py—pz), the polynomial py has a monomial 
q of degree h{py — Pz)', second, that degree is strictly between log log log d^, and 
2 log dv ■ A quick look at the formula 

Pu=Py+Pz + [Py - Pz)X 

shows that pu also contains q: indeed, by dz < h{py — pz), it must be the case 
that Py + Pz contains the monomial q; on the other hand, the minimum degree in 
{Py ~ Pz)x'^'^^^^ exceeds h{jpy — pz). This proves the presence of q in pu, which 



contradicts ( |42[ ). This, in turn, means that (41 1 cannot hold. The monomial q of 
degree h{py — Pz) is the residue that the big-number engine must clear before it 
can continue exponentiating degrees. Since dy > ^ log log d, at least one of these 
two conditions applies for any large enough d: 



L{\\og log d) + L(log log log d) ; 



L{d) > 

We use the monotonicity of L to reduce all the cases to the two above. The lower 



bound (39) follows by induction. □ 



Clearing Residues. Recall that ta > t/ is the time at which the flock Fa is 
formed at node a of .7-" after the fragmentation breakpoint t/ = n^°^^ . With the 
usual notational convention, it follows from ([T] [9| that, in the absence of noise, 
for t > ta, 

Va{t) = (P*-*'" ® Id)Va{ta) 

k>l 

where rria = (vrj Id)va{ta) is the stationary velocity of the flock Fa, ie, the 
d-dimensional vector of first spectral coordinates. As usual, it is understood that 
P,C, Xk,Uk, etc, are all defined with respect to the specific flock Fa and not the 
whole group of n birds. We subscript 1 with the flock size for convenience. By ([2]), 
p = n^; hence, by (|3 10), 



\\Va{t) - (1„, ® I,)mJ|2 < e-(t-t.)n-°(^)+0(n^). (43) 

By the general form of the stationary distribution vTa as (tr C~"'^)^"'^C^"'^ In^, its 



coordinates are CD-rationals over O(nlogn) bits. So, by Lemma 3.2 each coordi- 
nate of rria is an irreducible CD- rational Pa/qa, where the number of bits needed 
for Pa and qa is 0{tanlogn + pn) = 0{tanlogn). We denote the maximum bit 
length over all d coordinates by ^(rria). The following holds even in the noisy 
model: 
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^(rria) = 0{tanlogn). 



(44) 



Consider a flock Fc associated with a branching node c of let a and b be 
the two children of c in ^ (hence ric = na + rif,) and assume that ta > tfc and that 
no node of the forest has more than two children, ie, flocks merge only two at a 
timej^ By Corollary 3.20 the difference in stationary velocities between Fa and 



Fh satisfles 



\ma - mb\\2 < 



(45) 



If the difference is null, then by Cases II, III of the previous analysis {B = 0), 



0(1) 



Otherwise, by (44) and the equivalent bound for ||mf,| 



\ma - mb\\2 > n 



-0{tan) 



(46) 



The two inequalities (45 46) yield an upper bound on tc- By our treatment of 
Cases II, III in the proof of Lemma 3.22, we conclude that, whether rria = m^, or 
not, 



tr < n 



0{tan) 



(47) 



This leads to our earlier J7(log* t) bound on L{t). It is essentially a new deriva- 
tion of our previous result. We now see how to improve it. Let J-q be the forest 
derived from T by removing all nonbranching internal nodes and merging the ad- 
jacent edges in the obvious way. Our assumption implies that each internal node 
of J^o has exactly two children. Let oq, . . . , {k > 1) be an ascending path in !Fo 
and let hi denote the unique sibling of Oj . The following lemma assumes the noisy 
model. Its proof is postponed. 



Lemma 3.26. Assume that 2'' < logloglogtafc < i^o < ^ai < logW- Then. 
> y/log log tao, for some 0<io < k. 



The Recurren ce. W e set up a recurrence relation on L{t) to prove the lower 
ie, L{t) > (i.i938)i°g**-0(iogiogn)_ Let to = 2 TT [loglognj. 



3.24 



bound of Lemma 

It is assumed as usual that n is large enough. For t <tQ, we have the trivial lower 
bound L(t) > 1 (choose the constant in the big-oh to be larger than 1), so we may 
assume that t > to. The child 6 of a node c (both deflned with respect to J^o) is 
called near if t^ > (logtc)^^^- 



The simultaneous merging of more than two flocks can be deah with by breaking ties 
arbitrarily. Since there are fewer than n merges, this means that in our calculations time might 
be off by at most an additive term less than n. One can verify that this discrepany has no real 
effect on any of the derivations and conclusions presented below. 
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Figure 19: A big flock Fa may join with a small one, F^, to form a flock 
Fc that produces a much larger number than either one could manufacture 
on their own. This, however, cannot be repeated in the next step. To create 
a bigger number at the parent flock of F^, the residual heat in F^. (numbers 
in box) must be evacuated, which itself requires free energy that can only 
be provided by a flock that roughly matches F^. in size. In this way, an 
abundance of spectral shifts forces balance into the forest T . 



Lemma 3.27. Any internal node c of To such that tc > 2 has at least one near 
child. 



Proof. By (47), we know that c has a child in the original forest J- such that 
tc = n^^^^o" 
otherwise 



'o; 



set 



We exhibit a near child h for c. If 6o is branching, set h 
b to the nearest branching descendent of bQ. By Lemma 3.22 



the formation times of any node in T and its nonbranching parent differ by at 



most a factor of 2 



,o(i) 



Perturbations make no difference since they occur within 



polynomial time of a switch. Since T has fewer than nodes and tc>2'^'^, 



tb > 2-"°"'tbo > 2" 



1.0(1) 



log tc > (log tc) 



2/3 



with 



□ 



Let Co be an arbitrary node of To such that 

tco>t>to = 2]] [loglognj. 



(48) 



By the previous lemma, we can follow a descending path in To of near children 
Co, ci, . . . , Ci, where tci < 2 < tcj+i- Because to is so much greater than t^ , the 
path has more than a constant number of nodes — in fact, at least on the order of 
log log n. For future use, we note that 
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< log log log t 



CO- 



(49) 



Lemma 3.28. There exists k > 1 such that 



log log log tco < ^cfc < *Cfe_i < logtco 



Proof. By (49) and Lemma 3.27 there exists some cj in J^o such that 



(loglogtco)^^^ < tc, < logtco- 

Suppose now that all the nodes Cj, for i = j + 1, j + 2, . . . ,1, satisfy f^. > tc^_^. 
Since there are most n nodes along the path from cq to q in .Fo, then, by (49) 
again, 

2''' > tc, > tf; > (loglogt,J^— ^ > 2'''''\ (50) 
This contradiction proves the existence of some node Ck (j < k < I) such that 



4 < < log t 



The argument used in (50) shows that the smallest such k satisfies, via (49), 



tc^.i>r > (log log t 



Co) 



> 2' 



Another application of the inequality above, tc^.i > 2 , allows us to invoke 
Lemma 3.27 By virtue of tco being so big (49) and Ck being a near child of Ck-i 
(by construction). 



e > (logtc,_j'/=^ > 4-«"(logloglogteo)'/' > logloglogteo. 



-8n 



8/3 



□ 



We now prove Lemma 3.24 Setting aj = Ck-i for i = 0, . . . ,k, together 
with (49), the lemma sets the conditions of Lemma 3.26 This shows that ta^ > 
(log log log tafc ) and, conservatively, 

ib,g > (log log log log log taj^/^. 

Nodes ao and bi^ are roots of disjoint subtrees, so the number of leaves below 
is at least that of those below oq added to those below big . Since L is a monotone 
function and, by ( [48| ), is an arbitrary node such that ta^ > t, 

L{t) > L((logloglogt)i/4) + L((logloglogloglogt)i/3), 

for t > to = 2 It [log log nj, and L{t) > 1 for t < to- We solve the recurrence 
without the exponents, and then show that ignoring them makes no asymptotic 
difference. Define L*(t) = 1 for t <tQ and, for any t > to, 



L*{t) = L* (log log log t) + L* (log log log log log t) . 
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Given the bound we are aiming for, we can round off t down to the next tower-of- 
twos. If L*(t) = M{a), where a = log* t, we can rewrite the recurrence relation 
as 

M{a) = M{a - 3) + M{a - 5), 

where M(cr) = 1 for o" < log* to- Quite clearly, M{a) upper-bounds the maximum 
number Us of leaves in a binary tree T* where: (i) each left edge is labeled 3 
and each right edge 5; and (ii) the sum of the labels along any path is at most 
s = log* t — log* to- Note that T* is binary: the constraint that each internal node 
should have exactly two nodes does not follow from the definition and is therefore 
added. We seek a lower bound of the form cx*. This means that > x^~^ + x'^~^ , 
for s > 5 and cx* < 1 else. The characteristic equation is 

x5 - - 1 = 0. 

We choose the unique real root xo ~ 1.19385; this leads to c = Xq ^. This shows 
that Us > a^o~^' hence, 

L*(t) > ;rJ;'S*t-loglogn-5_ 

It is obvious that the binary tree T associated with the recurrence for L{t) embeds 
in T* with the same root. We claim that it is not much smaller: specifically, no 
leaf in T has more than a constant number of descendents in T*. This implies 
immediately that 



which proves Lemma 3.24[ □ 



To prove our claim, we show that no path in T* extends past its counterpart 
in T by more than a constant number of nodes. We model simultaneous, parallel 
walks down the trees as a collaborative game between two players. Bob and Alice, 
who take turns. Initially, both of them share the same value 

tA = tB = t> tQ. 

In one round. Bob modifies his current value by taking iterated logs. He is entitled 
to up to 5 logarithm iterations; in other words, he can set ts <— logt^ or 

tB ^ log log log log log tB , 

or anything in-between. Alice mimics Bob's move but then completes it by taking 
a fractional power; for example, if Bob opts for, say, log log t^, then Alice resets 
her value to (log log t^)", where a is a number between | and 1. To summarize. 
Bob chooses the number of log iterations and Alice chooses a: they can change 
these parameters at each round. A player's score is the number of rounds before 
his or her value falls below (or at) to- Alice's score cannot be higher than Bob's, 
so the latter is expected to play the last rounds on his own. 
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The joint goal of the players is to maximize their score differential. Regardless 
of either player's strategy, we show that Bob's score never exceeds Alice's by more 
than a constant. This follows directly from the next two lemmas, whose proofs we 
postpone. 

Lemma 3.29. The score differential is maximized when Boh always selects the 
single-iterated log rule and Alice follows suit with a = \; in other words, tB ^ 
log tB and tA ^ [logtAY^'^. 



With the strategy of the lemma, Bob's score is log* t — log* to- Within an 
additive constant, Alice's score is at least the minimum h such that Ch > t, where 
Cj is defined by cq = t^ and, for i > 0, Cj = 2^^^-^. To see why, note that the 
inverse of the function z ^ (logz)^/^ is z i— > 2^ ; taking logarithms on both sides 
gives the recurrence on Cj. 



Lemma 3.30. Forty to, min{ /i | c/j > t } > log* t - log* to - 0(1). 

This validates our claim that no path in T* extends past its counterpart in T 
by more than a constant number of nodes. This fills in the missing part in the 



proof of Lemma 3.24 and establishes the upper bound on the convergence time 
claimed in ^ 



Proof of Lemma 3.26 We begin with a few technical facts. Recall from the 
"Clearing Residues" section that the flock Fc is associated with a branching node 
c of J'-" and that a and b are its two children in J^; furthermore, ta > tf, and ta > tf, 
where tf = n-^°"'^ . Assume that the velocity vector of Fa at time ta is of the form 



where Ua G 



Va{ta) = (ln„ ® Id)in.a + {Ua ® Id)lJ'a + Ca 

/ia £ IR'^, and, for some real r, 

f 2*/ < r < t'J'; 
i{ma) = O(loglogr); 

\\Ua\\oD = 1 & Ua>0; 
,0(i) 



(51) 



(52) 



[ Kah < e 



Note that the d-dimensional rational vector ma is not defined as the stationary 
velocity m^ of Fa, though it plays essentially the same role. The term {ua ®Id)l^a 
creates the residue \\na\\2 Fa- Unless F^ can "destroy" this residue when it joins 
with Fa, one should not expect the flock formation time to grow exponentially. 
The crux is then to show that only a flock F5 with many birds can perform such 
a task. The following result says that, if the flock Fi, settles too early, its effect on 
the residue of Fa is negligible. The conditions on F^ stated below differ slightly 
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from those for Fa to make them closed under composition. The lemma below also 
covers the case n;, = 0, when the transition from Fa to Fc is involves the addition 
of an edge within the same flock. (Here, too, we assume without loss of generality 
that these additions occur only one at a time within the same flock.) We postpone 
the proof of this result. 

Lemma 3.31. Suppose that Fa undergoes no perturbation. If node b is well de- 
fined, then assume that ti, < log log r. Whether node b exists or not, 

Vc{tc) = (Ine ® Id)^a + {Uc ® IdjfJ'C + Cc , 

where 

' ll^tclloo = 1 & > 0; 

J|Cc||2<n||Ca||2 + e--'. 
Furthermore, if node b is well defined, then m;, = xi\.a 7^ rria. 



Remark 2.3. It might be helpful to explain, at an intuitive level, the meaning 
of the three terms in the expression for Va{ta), or equivalently Vc{tc). nia is a 
low- precision approximation of the stationary velocity nia', the vector {ua Idjl^a 
creates the residue; the remainder C,a is an error term. The term riia is a low- 
resolution component of the velocity that any other flock F^, has to share if it is 
to create small angles with Fa (the key to high flock formation times) Think of it 
as a shared velocity caused by, say, wind affecting all flocks in the same way. This 
component must be factored out from the analysis since it cannot play any role in 
engineering small angles. This is a manifestation of the relativity principle that 
only velocity differences matter. To create small angles with F^, incoming flocks 
Ff, must attack the residue vector {ua ® Id)fJ'a- Of course, they could potentially 
take turns doing so. Informally, one should read the inequalities of the lemma as 



a repeat of (52 ). The lemma states a closure property: unless Ff, brings many bits 



to the table (via a formation time at least log log r), conditions (52) will still hold. 
These conditions prevent the creation of small angles between flocks, and hence 
of huge formation times. In other words, flocks that settle too early cannot hope 
to dislodge the residue ||/^a||2- The reason is that this residue is shielded in three 
ways: first, it is too big for the error term d^a to interfere with it — compare 6""^"°*^' 
with e""^^" '^'^''+"'^*^' J second, it is too small to be affected by riia — compare ^ with 
a rational over O(loglogr) bits; third, all of its coordinates have the same sign 
{ua ^ 0), so taking averages among them cannot cause any cancellations. This 
form of "enduring" positivity is the most remarkable aspect of residues. 



By (52), the lemma's bounds imply that 



< II/.CII2 < i & ||Cc||2 < e--^-°'^'+"°'^ 
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which brings us back to (52). If c has a (unperturbed) parent d and sibhng 6', 
then we can apply the lemma again. Note that composition will always be applied 
for the same value of r, ie, one is that is not updated at each iteration. In other 



words, the first two lines of (52), unlike the last three, are global inequalities 
that do not change with each iteration. This closure property is not foolproof. 
First, of course, we need to ensure that t^^i < log log r. More important, we lose 
a polynomial factor at each iteration, which is conveniently hidden in the big-oh 
notation. So we may compose the lemma only n*^^^) times if we are to avoid any 



visible loss in the bounds of (52). Since the forest has fewer than n nodes, this 



means that, as long as its conditions are met, we can compose the lemma with 



ancestors of c to our heart's content and still get the full benefits of (52) 



The provision that b might not be well defined allows us to handle nonbranching 
switch nodes with equal ease. Recall that Vc{tc) is the velocity leading to time 
tc, ie, before the flock Fj, has had a chance to infuence it. The provision in 
question might thus appear somewhat vacuous. Its power will come from allowing 
us to apply the lemma repeatedly with no concern whether a node has one of two 
children. A related observation is that nowhere shall we use the fact that ta is 



the actual formation time of Fa- It could be replaced in (51) by any strictly 
between ta and tc- We thus trivially derive a "delayed" version of Lemma [3.31[ We 
summarize its two features: (i) Lemma 3.31 can be composed iteratively as often 
as we need to; (ii) node a need not be an actual node of J- but one introduced 
artificially along an edge of J-. 

What if Fa undergoes a perturbation between ta and tc? Then the flock 
Fa sees its velocity multiplied by a, where a is the diagonal matrix with 
a = (ai, . . . , ad) along the diagonal and rational \ai\ < 1 encoded over O(logn)- 
bits. Observe that the two matrices Pa Id and /^a ® S commute; therefore the 



perturbation can be assumed to occur at time ta- This means that, in lieu of (51 ) 
we have, using standard tensor rules. 



Va{ta) 



{Iria ® a) (In, ® Id)^a + {Ifia ® a){Ua ® Id)tl'a + ® a)Ca 

a)ma + {Ua ® a)fJ,a + {Ina ® a)Ca • 



Bringing it in the format of (51), we find that 



Va{ta) = {Ina -^d)ma + (Ua Id)fJ-a + Ca , 

with the new assignments: 

uia^ aula; 

Ua ^ Ua ; 
f-^a ^ fjia , 
^ Ca ^ {Ina ®a)Ca- 

It is immediate that the conditions of (|52|) still hold: the only difference is that 



(mc) < ^(ma) +0(logn) 
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By (52), logr >t/ 



n 



fon^ 



, SO ^(rric) stays in O(loglogr) as long as the number 



of compositions is O(n^), which it is. We summarize these observations: 

Lemma 3.32. Let cq, . . . , q be an ascending path in T and let di he the sibling, if 
any, of Ci. Assume that cq, possibly an artificial node, satisfies the conditions of 
node a in (52) and that t^^ < log log r for all di. Then, 

VcA'tcd = (Inc, ® h)^c, + {Uc, O Id)l^c, + Cc, , 



where 



£(mcj = O(loglogr); 

'\Ur. lino = 1 & >0; 



J'Ci I loo 
rnOm / ||„ II / I 



For all di, m^. 



JICc 



< e" 



^0(1)+„0(1) 



We are now equipped with the tools we need to prove Lemma 3.26 Recall that 
ao, . . . , afc (A; > 1) is an ascending path in !Fo and hi denotes the unique sibling of 
Oj. (Note that oq • • • flfc is a path in To whereas, in Lemma 3.32 , cq • • • Q is a path 
in !F.) Also, 



2^ ^ < logloglogt^fe < C < < logtafe- 



4 
ao 



Assume, by contradiction, that 4^ < y^log log ta^ for i = 0, . . . , A: — 1. As we 
observed earlier. Lemma 3.22 ensures that, regardless of noise, the ratio between 
the formation times of any node in T and that of its nonbranching parent is 
at least 2~"'^'^\ Since there are fewer than switches, this implies that F^o 
can undergo switches or perturbations only between ta^ and taQ2'^° ^ . Because 
> t'io > 2^ ^ , with tf = n-^""'^, this shows that the entire time interval [^tau ^ai) 
is free of switches and noise. Let a be the last node in T from ao to oi and let cq 
be the artificial parent of a corresponding to the flock Fa at time t^i — 1: we set 

flock FcQ are heavily damped. Indeed, 



riap and t^^ = ta^ — 1. The bound in ( 43 ) ensures that the oscillations in the 



■>C0 ) 



where, because of the magnitude of t^i , 

-(iaj/2-l)n-0(l)+0(n3) ^ -*„.r,-0(l) 



lie 



coll2 



< e" 



< e" 



(53) 



(54) 



where r = ■^ta{^. The rest of the sequence {cj} is now entirely specified. In 
particular, note that ci = oi and do = feo- By extension, rrico = rria; so, by (45), 
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therefore, rrico = mfeg + ficg > where 



(55) 



As we shah see, the presence of the square r in the exponent of (54 1 ensures that 
the osciUations of Fc^ are too small to interfere with the residue H/^colb- Writing 
riicQ = Hibo, it follows from (53) that 

Vcoitco) = (ln,„ ® Id)mco + (Incn ® ^d)^J'Co + Ceo , 



which matches (51), with Ucg = Incp- Since all the nodes di are of the form bj., 

td, < Vloglogtao < log log r. 



Thus, we will be able to apply Lemma 3.32 once we verify that all conditions 
in (52 ) are met: 



[2*/ < T < tea ]'■ This follows from our setting r = ^(^co + 1)^^^ and our 
assumption that ta^ > 2 



2V 



[^(filc 



2V" 



0(log log r) ] : Because r > 2 
V^loglog tao nlogn < (log log 



n2/3 



o(log logr). 



The desired bound follows from ( 44 ) : 



■CO J 



(rrifej = O{tbonlogn) = 0(^/logIog7^ nlogn) < loglogr. 



.0(1) 



^ ll^colb ^ r ]• '^^^ upper bound comes from (55). For the lower 



bound, note that rric,, = iria, with ta < tao^"^*^'. Another application of (44) 
shows that 

£{ui,^) = 0{tanlogn) < t^^ < r. 

We just saw that £(m;,„) < loglogr, so /^co = — nifeg is a d-dimensional 
vector with rational coordinates over fewer than 2r bits. The lower bound 



follows from the fact that 0. By Lemma 3.22 the stationary velocities 

lUcQ and m;,g cannot be equal, otherwise the two flocks Fa and F^g could not 
take so long to meet at time ta^. Indeed, the time elapsed would be at least 
tai — ta, (since ta > tbo), which would greatly exceed the limit of ta2^' 
allowed. 



,0(1) 



Co 1 1 oo 



1 & > & llCcolb < 



2„-0(l)_^„0(l) 



: The bounds follow 



from (54) and n, 



CO 
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Let Q be the node a^. By applying Lemma 3.32 at q, we find that m6j._j = m^^^ 



Applying the same lemma now at node ci^i shows that 
where 



ll/^q_il|2 > e" 



tO(1) 



ic,_il|oo = 1 & > ; 



II/. II / _T-2„-0{l) , 0(1) 

The lemma also allows us to express the stationary velocity at q_i: 



-1 = {-^ci^i ^ Id)Vci_dtc,^i) 
By the triangle inequality, it follows that 



> 



> min{(7rc;_Jj}e ™ -Vde 



-0(1)+„0(1) ^ ^_^„0{l) 



By (45), 



I II ^ logiai. Ofn^) 



therefore, since ta^. > 2*^ > n'' 



+ <r Ur^ w. 11-2 ^ TriO(i) ^ ^t1-5 

tafe < ||mc,_, - mb^_J|2 <e <e , 



which contradicts our assumption that r = jiai^ < (log^afc)"*^^^- 



□ 



Proof of Lemma 3.31 , Using the shorthand u"' = P^" *"^Ua and C° = {Pa" ' 



Id)Ca, we express the velocity of the flock Fa at time tc- From 

Va{tc) = {Pt''^ (^Id)Va{ta), 



we find that, by (51), 



(56) 



Because Pa is an averaging operator, {{Pa" *"Ua||oo ^ ||^^a||oo = 1- The vector Ua 
is nonnegative, so 



a uiicxj_„^iia uiii ria a " — Ua ^ o. 

> ^^^lua > ^min{(7r„)i}||na||oo > n-^(i). 



1 -1 T ptc — ta„, \ 1 ptc~ta„ 
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Similarly, by convexity, 



therefore, 



< \/(in^||Ca||oo < \fd^a\\Ca\\2] 

j n~^(i) < ||n"||oo < 1 & > 0; 

1 ||C||2<n||Ca||2. 



Case I. Node b is well defined and < log log r: Since, by (52), tc > 
8*/, with tf = n^o"', 

-{t, - tb)n-O(i) + 9(^3) < _^2. 
so, by applying ( [43| ) to the flock Fh, we find that 

\\v,{Q - (1„, /,)m,||2 < e-(*-*^)--°<^'+0("^); 

hence 

where lUclb < 1- It follows from (|56|) that 



+ 1 i: 



By (51), the stationary velocity of Fa is equal to 

ma = (vrj Id)Va{ta) = (ttJ /d)((lna -^d)ma + {Ua ® Id)fJ-a + Ca) 
= nia + {7raUa)Ha + (ttJ /d)Ca- 

By the triangle inequality, it follows that 

||ma - ma||2 > TTaUaWfJ^ah " IK^J ® -^d) || F || Ca || 2 



which shows that 



> mm{(7ra)j}e -Vde ^ >e 



ma / ma . 



0(1) 



Note also that, by (59), 



|ma - mb||2 < ||ma - ma||2 + ||ma - mfc||2 

< vrjuall/ialb + IkJ (g) /dllFllCalb + ||ma - m6||2 
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We bound each term on the right-hand side: by (52) and Cauchy-Schwarz, 



vrjuall^alb < ^IKalblllialb < ^Vrhi\\Ua\\oo < \ 

By Q and tc> > 8*/, 



n . 



Also, \\t:1 ® UWf = 0{l) and, by (52), ||Ca||2 < e"- 



r2„-0(l)+„0(l)_ 



; therefore 



|ma - mfe||2 < 



By ( 44 ) , our assumption that tj, < log log r implies that 

^(irib) = 0(n(logn) loglogr) < (log log r)^. 



Since, by (52), (.{ma) = O(loglogr), the squared distance ||ma — mfe||2 is a rational 
over O(loglogT)^ bits: being less than l/r implies that it is actually zero; hence 



rria = m;,, as claimed in the lemma. We verify from (57) that 

and 



42? II a 1 1 
^J'C — A*a||^ ||oo 



Uc= ( Q ) WU^WJ 



satisfy the conditions of the lemma. By ( |58| ), 

Vc{tc) = (Ine ® /d)ma + {Uc ® Id)lJ^c + Cc , 

where 



(-a 



By (57) and H^db < the lemma's condition on C,c is trivially satisfied. 



Case II. Node h is not defined: We set Cc = C°", l^c 



It" II 11^. This matches the identity (56) with the one claimed in the lemma. □ 



and Uc 



Proof of Lemma 3.29[ Suppose that Bob does not always follow the single- 
iterated log rule. We show how to force him to do so without decreasing the 
score differential. If Bob uses the rule is <— log log ts, then Alice follows up with 
tA (log log t^)"- Let us break this round into two parts: 

1. tB ^ logtB and tA ^ logtA] 

2. ts ^ logtB and tA ^ (logtA)"- 

We proceed similarly for higher log-iterations and apply the modification system- 
atically. This transformation increases the scores of the players but it does not 
change their difference. Finally, we apply one last transformation to the new 
game, which is to convert all of Alice's moves into tA (logt^)^''^- This can only 
increase the score differential. □ 
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Proof of Lemma I3.30L Consider the two recurrence relations: 

ao(x) = boix) = X, 

and, for h > 0, 

j ah{x) = 2'^ft-i(^) 
\ bh{x) = 2^'>-i(^) +2. 

Recall that c/j is defined by cq = ^^'i, for h > 0, Ch = 2^'^'^-^. We verify by 
induction that, for any /i > 0, 

^2''h-l(41°g*0+2) 
Ch = 2^ 

To prove the inequality we seek, 

min{ h\ch>t}> log* t - log* to - 0(1), 

where t > to, we may assume that t > 2*", otherwise the result is trivial. The 
assumption implies that the minimum h is positive; therefore it suffices to prove 
that, for all /i > 0, 

6ft (4 log to + 2) < a;, (4 log to + 4). (61) 
We see by induction that, for all /i > 0, a; > 2, and e > 0, 

ah{x)+e<ah{x + e2-''). (62) 

The case /i = is obvious, so consider h > 0. Note that, for any y > 2, 

which follows from 

ln{l + e2-y) < e2-y < ^^e. 
Since aft_i(x) > 2, this shows that 

ay,{x) + e = 2'^'^-i(^) + e < 2'^'-i(^)+^/2 < 2'^'>-i(^+^2-ft) ^ ^^^^ ^ ^2"'^), 



which proves (62|. Next, we show by induction that, for all /i > and x > 2, 

hh{x) <ah{x^2-2^-^). 



(63) 



The case /i = again being obvious, assume that /i > 0. By (62), 

hh{x) = 2'"^-i(^) + 2 < 2"'-i(^'+2-2'-'') + 2 

< aft(x + 2 - 22-^^) + 2 < aft(x + 2 - 2^"^), 



which establishes (63); and hence (61). 



□ 
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4 The Lower Bound 



We specify initial positions and velocities for n birds, using only 0(logn)-bits 
per bird, and prove that their flock network converges only after a number of 
steps equal to a tower-of-twos of height logn. Our proof is entirely constructive. 
The hysteresis assumption of the model is not used and, in fact, the lower bound 
holds whether the model includes hysteresis or not. Our construction is in two 
dimensions, d = 2, but it works for any d > 0. The n birds all start from the 
X-axis (think of them on a wire), and fly in the (X, y)-plane, merging in twos, 
fourths, eights, etc, until they form a single connected flock. This process forms 
a fusion tree T of height log n. (We assume throughout this section that n is a 
large odd power of two.) Every flock formed in the process is a single path. The 
transition matrix is that of a lazy symmetric random walk with, at each node, a 
probability ^ of staying put. 




Figure 20: Birds join in flocks of size 2, 4, 8, etc, up in the fusion tree, each time flying 
in a direction closer to the F-axis. The angle decreases exponentially at each level, so 
the time between merges grows accordingly. The big arrow indicates the Markov chain 
corresponding to a 4-bird flock. At each state, the probability of staying put is |, with 
the remaining | being distributed uniformly among the outgoing edges. 

Initially, the velocity of each bird has its Y-coordinate equal to 1. Since aver- 
aging these velocities will only produce 1 , the birds move up away from the X-axis 
forever at constant speed 1. We can then factor out the Y coordinates and focus 
our entire investigation on the birds' projections on the X-axis. In fact, we might 
as well view the birds as points moving along the X-axis and joining into edges 
when their distance is 1 or less. In other words, we let x{t) denote the vector 
(xi(t), . . . , Xn{t)) and let v{t) = x{t) — x{t — 1). The coordinates of the velocity 
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vector v{t) will quickly decrease, but we should not be mistaken into thinking 
that the birds slow down accordingly. Because of the Y motion, all the birds will 
always fly at speeds very near 1. Let c be a large enough odd integer: one will 
easily check that c = 1 1 works, but no effort was made to find the smallest possible 
value. We leave c as a symbol to make it easier to follow the derivations. 



Initial Conditions < 



'x(O) = (O, |, 2, §,..., 2Z,2/ + §,..., n-2,n-|)'^; 
v{l) = ( 0, -n-", 0, 0, . . . , 0, -n~^ 



Each nonleaf node a of the fusion tree T has associated with it a flock of 2^ 
birds whose network is a single path: the index j > is also the height of the 
node. The flock Fa at node a is formed at a time tj that depends only on the 
height in 7"; by convention, ti = 0. Given a node a at height j > 0, we denote 
by v"' the 2''-dimensional velocity vector of the flock Fa at time tj and by ma its 
stationary velocity. For j > 2, if Z and r denote the left and right children of 
o, respectively, then = v'^ . In other words, two sibling flocks start out with 
the same initial velocity. At time tj , because of noise called flipping^ the velocity 
vectors of these flocks will have evolved into C and —C , respectively, where 
£ is a linear transformation specific to that sibling pair. This implies that 

-jOv"- 



The stationary velocity of the flock Fa satisfles 



23 



2J 



_lV2'''''''2 



(64) 



The initial conditions provide the velocity vectors of the 2-bird flocks at height 1 
one step after t = 0. It follows that, if a is a node at height j = 1, the stationary 
velocity rria is equal to ^{—l)^'^^n'~'^, where k is the rank of v among the nodes at 
height 1 from left to right. For consistency, we must set v"" = (— l)^n~'^(l, — 2)"^. 
This choice is dictated by the initial conditions set above, so that, for any j > 1, 
the velocity of the flock at v at time t {tj <t< tj+i) is equal to P ^v", where 



/I 


2 





... 0\ 


1 


1 


1 


... 





1 


1 1 


... 







1 


1 1 


Vo 







2 ij 



(65) 
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At height 2 and above, some flocks undergo a velocity flip at chosen times: this 
means that the sign of their current velocity is reversed and it becomes —Pj ^v"" 
at time t. By abuse of notation, we say that the node flips: it is instantaneous 
and does not count as an averaging transition. When does this happen and why? 
Fix an integer / = 3. Again, we leave this constant as a symbol for clarity. 

Flipping Rule: It applies at time t = tj + to any flock of a left 
child of even height j > 1 and to any flock of a right child of odd height 
J >2. 




Figure 21: Flipping alternates between left and right. Leaves were added to indicate 
that nodes at height 1 correspond to 2-bird flocks. 

Flips are convenient to make flocks collide. We show later that they conform to 
the noisy flocking model. At height 2 and higher, any two sibling nodes I and r 
are assigned the same velocity vector = v^. Their corresponding flocks evolve 
in parallel for steps, like two identical copies. Then, one of them "flips" (which 
one, left or right, depends on the height in the tree), meaning that the two velocity 
vectors become opposite of each other. The flip type alternates between left and 
right as we go up the tree. Although flipping has only a trivial effect on velocities, 
which decays over time, we must be careful that it does not break flocks apart. 
We could rely on hysteresis to prevent this from happening, but as we said earlier 
we seek a lower bound that holds whether hysteresis is present or not. That is 
why we introduce the lag . The averaging operations act like glue and the glue 
needs to dry up before changing direction. 

Up to sign, v'^ depends only on the height j of node a, so we focus our attention 
on the left spine of the tree, denoted oi, . . . , aiogn in ascending order. The exact 
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behavior of every flock in the system can be found in rephca either at a node aj 
or at a sibHng of such a node. That is why, when checking the structural integrity 
of the flocks, it is not quite enough to concentrate on the left spine: we must 
also check the right children hanging off of it. For any 1 < j < logn, we deflne 
9j = tj+i — tj as shorthand for the lifetime of the flock Fa^ ■ Our task is two-fold. 
First, we must show that Im^^l decreases very fast: we prove that (roughly) 

I I ^ -n(|m-.i , I) 

which implies that 6j is exponentially larger than 9j-i; hence the logarithmic 
tower-of-twos lower bound. Second, we must prove the integrity of the scheme: 
that each flock remains a single path over its lifetime; that two flocks meet when 
and where they should; that flipping flts within the model; etc. 



4.1 The Early Phases 

The proofs are technical but one can develop some intuition for the process they 
mean to explain by working out the calculations for rria. (j = 1,2,3) explicitly. 
At time t = ti = 0, the network consists of the edges (1, 2), (3, 4), . . . , (n — 1, n). 
We already saw that the 2-bird flock {81,82) has initial and stationary velocities 

and rria^ = ^n~'^. (66) 




Flying at Height 1. Because the velocity at time t captures the motion ending 
at t, the velocity of the flock {81,82) at time 1 is Piv"-^. By ([l5]), for t > 0, 



x{t) = x{0) + Y,P'v{l) 



s=0 



which gives us 



xi{t) 

0C2{t) 



xi{0) 



t-i 



s=0 



Diagonalizing Pi shows that, for any integer s > 0, 

P! = \ (I) (1 1) + ^(-3)-^ (1 -1 

It follows that, for {) = ti < t < t2. 



'xi(t) = in-^+in-'=E*=ol 
X2{t) = I + §n~' 



-3)- 



\n--{t+l + \{-?,) 



n 



K^Es=o(-3)-^ = i + K^(t 



l(-3)^-*). 



(67) 
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Note that Bi always stays to the left of B2 and their distance is 
X2(t)-xi(t) = |-|n-^(l- 



(68) 



Left to their own devices, the two birds would slide to the right at speed rria^, plus 
or minus an exponentially vanishing term; their distance would oscillate around | — 
|n~'^ and converge exponentially fast, with the oscillation created by the negative 
eigenvalue. This is what happens until the flock at ai begins to interact with its 
"sibling" flock to the right, {Bs,B4). The latter's velocity vector is {—n~'^,0)'^ at 
time t = 1 and, for ti < t < t2, 

('x3(t)=2-in-(t+| + |(-3)i-*); 

b4(t) = i-K^(t-f-i(-3)^-*). 

The stationary velocity of (^33,^4) is — rria^ = —^n~^, but the flock is not the 
mirror image of {Bi,B2), a situation that would bring the flocking to an end. In 
particular, note that the diameter of the flock is 

X4(t)-X3(t) = | + fn-^(l-(-i)*), (70) 

which always exceeds that of {Bi,B2) for all t > 0. The diameters of both flocks 
2 
3 



(69) 



oscillate around 1 but in phase opposition: indeed, their sum remains constant. 



Both 2-bird flocks drift toward each other at distanc^^ x^jt) — X2{t) 
This implies that ^2 = + ^1 = [5"-'^' 
is odd, = 2 (mod 6); hence, \\n^~\ 



tn 



Because n is an odd power of two and c 
1(71" + 1) and 



t2 = 9i = \{rf + 1) = 1 (mod 2) 



We conclude that 



X3{t2) - X2{t2) 



1 



7?! 



(71) 



(72) 



The definition of flip nodes suggests a cyclic process with period 2 that is inherent 



to the flocking process, 
velocity 



At time t2, the flock at 02 is formed with the initial 



V 



-p 









I] 






y 





l + (-3)i 

i-i-sy-"^ 

-1- (-3)1-^1 

V-i + (-3)'-'v 



(73) 



By (64), the stationary velocity for the 4-bird flock is 3(2, 1, 1, 2)^*^^) hence, by (66 
Til), 



> m, 



a2 



3V2' ' ' 2)^ 



-3)- 



-n{mal)^ (74) 



2'" V3J 

This inequality gives an inkling of the kind of exponential decay we envision as we 
go up the fusion tree. Note that rriaj < 0, which means that the flock is drifting 
in the wrong direction: that is why 02 is a flip node. 



^The linearity in t is due to an accidental cancellation that will not occur for bigger flocks. 
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Figure 22: The 4-bird flock is formed at time ^2 and acquires a negative 
stationary velocity rHa^ ■ 



Flying at Height 2. Again, by (15), for t2 < t < t2 + n-f < t^, 

(xi{t2) 



+ P^+^v''^ with P2 = l 

s=0 



/I 2 0\ 
1110 
111 
VO 2 1/ 



By straightforward diagonalization, we find that, for any integer s > 0, 



_ 1 

^2 - e 



/1\ 

1 

1 

VI/ 



(l,2,2,l) + g 



1 

-1 



(1,1,-1,-1) 



+ h(-3)" 



-1 
1 

v-v 



(1,-2,2,-1); 
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therefore, 



(Xl{t2)' 



+ m.a.,{t - t2) 



+ n 



1 
1 




5 
-5 






vv 






V-iv 








-1 
1 






1 

-1 




V 2^ 








V 1/ 



(76) 



It follows from (68 70) that, for t2 < t < t2+n^ , both X2{t)—xi{t) and X4{t)—X3{t) 



are ^ it ©(re"'^); therefore, the two end edges of the 4-bird flock are safe, which we 
define as being of length less than 1 (so as to belong to the flocking network) but 
greater than | (so as to avoid edges joining nonconsecutive birds). The middle 
one, (2,3), is more problematic. Its length is 

Mt) - X2{t) = xs{t2) - X2{t2) - ^n-^ (15 - 16(1 )*-*2 + (-3)*^-*). 

We can verify that 



15 - 16(^ 



-t2 



+ (-3) 



t2-t 



> 0, 



for all t > t2, which, by (72), shows that the distance between the two middle 
birds B2, B3 always lies comfortably between 1 — (^ + 0{l))n~'^ and 1 — ^n~'^. The 
upper bound is both lucky and intuitive: lucky because the edge starts with length 
very near 1 and it could easily be perturbed and break up; intuitive because the 
two flocks have inertia when they bump into each other and one expects the edge 
(2,3) to act like a spring being compressed, thereby shrinking during the initial 
steps. 



Flipping Velocity at Height 2. Since a2 is a flip node, the velocity vector 
reverses sign after a lag of n-^ steps. Instead of redoing all the calculations, we can 
apply a simple symmetry principle: by linearity, the positions of the flock with 
and without the flip average out to what it was at time t2 + n-^ (Figure 23). In 
other words, for t2 + < t < t^, 



hi{t2 + 



n 



\X4{t2 + n-f)^ 
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Figure 23: The 4-bird flock at a2 "flips" at time t2 + 



By (76), the position formula for the flock can be readily updated: 

/ 11\ 





+ ina^{2nf + t2 - t) 



/1\ 

1 

1 

VI/ 



+ n 



5 
-5 

V-11/ 

1 
1 

V 2/ 



1 
-1 

V 1/ 



(77) 



This proves that the lengths of the two end edges differ by what they were at t2 
by only 0{n^^). Indeed, by (68 70), this implies that, for any t <t^, 



X2{t)-xi{t) = l±0{n~% 
X4{t)-xs{t) = |±0(n^^). 



(78) 
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The middle edge has length 

Xsit) - X2{t) = X3{t2) - X2{t2) - | 



n 



+ 2n 



2\t-t2~nf 

3 



3^n-(-3)- 



(-3) 



t2+rJ-t 



which, in view of (72 ), shows that, for t2 + <t < ts, 
l-0{n-^) < X3{t)-X2{t) < 1 



in-. 



(79) 



This proves that the middle edge is safe and the integrity of the entire 4-bird flock 
is preserved. Was it necessary to delay the flip by n^^? The particular choice of 
lag, , will be justified later by examining the bigger flocks, but we can see right 
away that delaying the flip is mandatory. Indeed, if we replace n-^ by in the 
expression above, then, for t = t2 + 2, we get 



Xsit) - X2{t) = X3(t2) - X2{t2) + In 



1 + kn~ 



which causes the flock to break apart. The flock . . . ,Bs) follows the same 
trajectory as the 4-bird flock above, shifted along the X-axis by 4 but with no 
velocity flip. So, by (67 76), we find that, for t2 + < t < t^, 



U^it) = x^{t2) + uia^it - t2) + ^n--^ - 2n-^(|)*-*2+i - ^n"^(-3) 
\x5{t2) = xi{t2) + 4 = in-- (t2 + f + i(-3)i-*2) + 4. 

At the same time, by ( [77| ), 
X4{t) = X4{t2) + ina2{2nJ^ + t2 - t) 



t2-t. 



-n 



+ 2n 



where, by (69), 



2\n-'+l 



X4{t2 



2\t~t2-n' 



+ 1 „-(_3)-"V2 - (-3) 



\t2+nf-t 



(t2-|-|(-3)^-*^). 



By (71), this shows that, for t2 +'nf < t < t^, 

X5{t) - X4{t) = I t2n"^ + 2ma2{t -t2- nf) + ^n" 



+ (g ± o(l))n-- + 2ma2{t -t2-n^) 



2\"-'^+i 
3/ 



12 



n--(-3)-"' 



Recall from (74) that rriaj is negative. This allows the distance x^{t) — X4(t) to 
fall below 1. This happens at ^3 = t2 + ^2) where 62 = + Bdm-^^l). Note that 
Irriajl is sufficiently small for the newly formed edge (4,5) to be safe at time t^. 
We can see that from ( 74 ) , which also shows that 

02 >J^(|m-i|) >e^('"«i'). (80) 



77 



We conclude this opening analysis with an estimation of the stationary velocity 
rriag. The flipping rule causes the velocity of the flock {Bi, . . . ,84) to be reversed 
at time t2 + n^ ■ (It's a flip of type "left," so named because it involves a left child.) 
Following the flip, the velocity of the 8-bird flock at 03 is, at its creation. 



,,«3 



82 



,0.2 



-Pi 

pe2 ya2 



By (73 75) 



^2 „,a2 



12"' 









1 


< -2 


1 







(-3) 



1-6 



+ 4 



02 



( n 

1 

-1 

V-2/ 



2(-3) 



-02 



/ 1\ 

-1 

1 

V-1/ 



By (64 80), therefore. 



m, 



as 



7^2' 



1,1,1,1,1,1,^) 



42 



_pe2 ^aa 
pe2 ^aa 



1(1 

7 



(i, 0,0,-^)^2'^ V'^^ 



< e 



Since rriag > 0, the next flip must be of type "right," which happens to agree 



with the flipping rule. Observe from (66 74) how, as j increases from 1 to 3 



the stationary velocity rriaj decays from polynomial to exponential to doubly ex- 
ponential. To generalize this to further heights is not difficult. What's tricky is 
to show that, despite all the symmetries in the system, the stationary velocities 
never vanish. For example, if we formed new flocks by attaching to a smaller one 
its mirror image, this would bring the drifting motion, and hence the flocking, to 



an end. We summarize our findings below (66 71, 74 80): 



m, 



■ai I 



in- 



ImagI < e" 



02 > 



0.2 



(81) 



4.2 Velocity Analysis 



Our first task is to diagonalize the transition matrix P.j given in (65). The Lapla- 
cian acting on a path is akin to its acting on a folded cycle. Since the Fourier 
transform over a finite cyclic group diagonalizes the one-dimensional Laplacian, 
we can interpret the spectral shift as a linear operator acting on the Fourier coef- 
ficients. We explain why below. 
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The Folded Cycle. The Fourier transform over the additive group provides 
the eigenvectors yi, ■ ■ ■ ,ym the Hnear map M defined by the circulant matrix 



/I 


1 





... 1\ 


1 


1 


1 


... 





1 


1 1 


... 







1 


1 1 


V 







1 1/ 



namely, 



Vk 



,2m{k-\)/m 



The associated eigenvalue is equal to 



Vl + 2cos 
o V m 



27r(fc - 1) 



)• 



We shall see shortly why using the notation Ajt, reserved for the eigenvalue of 



J' 



is legitimate. To see the relation with Pj, set m = 2n — 2 and n = 2-', and note 

that the eigenvector coordinates {yk)j and {yk)m+2-j are conjugates. This implies 
that 3fi yfc is a real eigenvector of M. that lies in the n-dimensional linear subspace 



m 



^m+2-j 



0}. 



Furthermore, it is immediate that Pj is equivalent to the restriction of M to J^; in 
other words, folding the cycle in the middle by identifying opposite sides creates 

the desired averaging weights (in particular, 2/3 at the end nodes) and transform 
the Fourier vectors into right eigenvectors for Pj (hence the valid choice of the 
notation A^). It follows that, for 1 < < 2^, 



Uk 



(^- 



cos 



7r{k - 1) 
ra — 1 



cos 



Tr{k- l)(n-2) 



n 



1 



is the unique right eigenvector (up to scaling) of Pj for Afc. We note that, unlike 
for the m-cycle, the transition for the n-path has only simple eigenvalues. 



Consider the evolution of a flock at node aj for j >1. Let 

7r= -^(il,...,l,i)^ and diagC,- = KCC^^TT^)- (82) 
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Figure 24: The folded m-cycle. Identifying opposite nodes allows us to use 
the harmonic analysis of the cyclic group to the path-shaped flock. 



For s > 1, we diagonalize the matrix = Ivr-^ + Qj, with by (|9|, 

Q',=^KCy'vkvlC-'^\ (83) 

k=2 

1/2 

where the right eigenvector Cj is proportional to with the normalization 
condition, \\vk\\2 = 1- By elementary trigonometry, it follows that, for any 1 < 
k < V, 

— 3 "T 3 '^'^^ 20-1 ' 

Vk= 5k[^, COS ^^fc^, ■ . . , COS , ^^^) , 

where 5k = \/2(2^' - 1)-V2 for 1 < fc < 2^ and (Jsi = (2^ - 1)"^/^. Recah that 
0j = tj+i — is the lifetime of the flock Fay By the triangle inequality and the 
submultiplicativity of the Frobenius norm, for any z, 

\\Q]42 < W E \\Cy'vkvlC-'/'zh < |A2r E \\Cy'\\F\\C-'/'Mzh 

k>l k>l 

<2i+i|l + |cos23^n|z||2. 
A Taylor series approximation shows that, for j, s > 1 and any z, 

||Q-2||2<e^'+^-^(^^"')||z||2. (84) 



We avoid decorating tt and 1 with subscripts when their dimensionality is obvious from the 
context. We use Vk instead of the notation Uk from §3.2[ One should be careful not to confuse 
these eigenvectors with the velocities. 
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Figure 25: The spectrum of two colliding flocks: to produce a tower-of-twos, the first 
Fourier coefficients must cancel each other and be replaced by a linear combination of 
the higher ones. This spectral shift must ensure that the new first Fourier coefficient is 
nonzero. This will automatically produce an exponentially decaying energy spectrum. 



Spectral Shift as Energy Transfer. After s steps following its creation, the 
flock Fj moves with velocity 

k=2 

where ak{s) = Xf.v'^Cj ^^"^v^k For A; > 1, the Fourier coefficients cy.k{s) decay 
exponentially fast with s while the first one, the stationary velocity, remains con- 
stant. What happens when another flock Gj "collides" with Fjl A tower-of-twos 
growth requires two events: one is that, within the algebraic expressions defining 
the new Fourier coefficients, the stationary velocities should cancel out; the other 
is that the new first Fourier coefficients should not be zero. For example, consider 
a flock Gj that is mirror image to Fj and heads straight toward it. The two sta- 
tionary velocities would cancel out, but the new one would also be zero. Restoring 



the dimension Y would produce the spectrum on the left in Figure 26 and, con- 
sequently, a vertical flying direction: this would dash any hope of achieving a 
tower-of-twos. 

The trick is to ensure that the energy contained in the higher Fourier coef- 
ficients averages out in a way that produces a new stationary velocity that is 
nonzero: in two dimensions, this will create a direction close to vertical but not 
exactly so (right spectrum in the figure). The spectral shift can be viewed as a 
transfer of energy from the fc-th Fourier coefficients (for all /c > 1) to the first 
one. The issue is not how to produce exponentially fast decay but how to transfer 
strictly positive energy. Too much symmetry wipes out all the energy in the first 
Fourier coefficient, while too little symmetry produces a new stationary velocity 
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that is a nonzero average of the previous ones. The first case prevents future col- 
hsions; the second one produces a new flying direction that deviates from vertical 
by only a polynomially small angle. 




Figure 26: Too much symmetry makes the first Fourier coefficient vanish (left box) and 
produces a vertical flying direction. Too little symmetry produces an excessive stationary 
velocity and a polynomially small nonzero angle with the Y direction. The right amount 
of symmetry produces an energy transfer from the decaying higher Fourier coefficients to 
the flrst one, thus creating an exponentially small angle (right box). 



The Spectral Shift in Action. Since we are only concerned with velocities 
in this section, and not with positions, we may assume without loss of generality 
that all flipping is of the right type: in other words, we stipulate that the flock 
of any right child of height at least 2 should get its velocity reversed after the 
prescribed lag time. To restore the true flipping rule will then only be a matter of 
changing signs appropriately. With this simplifying assumption, the aggregating 
formula of (73) becomes, for j > 1, 

(85) 



The averaging operator Pj cannot increase the maximum absolute value of the 
velocity coordinates; therefore, by (66), 

ll^^^^ lb < 2^^'^\\v''^\oo < 2-''/2||t;«i||^ =2^'/2(2n-'=). 

In other words, for any j > 1, 

||^;«i||2 < 2J/2+in-^ (86) 
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Lemma 4.1 For any j > 1, the stationary velocity of the flock at node aj satisfies 

eft eigenvector, 



Proof. The stationary distribution for a 2^ ^-bird flock, bein g a 

■ 

is normal to the right eigenvectors; hence to QfSi By (64 



85), 



2J 



2J 



m 



23 - 1 



(if,. ..,1,^)1-''^ = :.— Ml 



2^- 



2J- 



(if 1 i i 1 1 i ) ^^-1 



2J- 



+ 



^(o.....o,i,i,o,,,..o)(_«g;;;_; 



2J 



By (|86(, therefore, Hu^^-ilb < 2(^+i)/2n"^ and, by (|84(, 
1 



23 - 1 



1 



< 

oo - 2J - 1 



□ 



From the spectral decomposition 



fc>i 



we see that the stationary velocity rria^. is the first Fourier coefficient, ie, the 
spectral coordinate associated with the dominant eigenvalue 1. The cancellations 
of the two copies of in the computation of that coefficient has the effect of 

making rria^- a linear combination of powers of higher eigenvalues. That part of the 
spectrum being exponentially decaying, the corresponding spectral shift implies a 
similar exponential decay in the new first Fourier coefficient. This is the key to 
the tower-of-twos growth. Indeed, as we show next, the next inter-fiock collision 
cannot occur before a number of steps inversely proportional to that first Fourier 
coefficient. 



Lemma 4.2 For any j > I, 9j = + Q{ |m„.^|). 



Proof. By (81), we can assume that j > 1. For tj < t < ti+i, the velocity of the 



flock Fa is of the form 
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where the sign changes after a flip. By (84 86), 



Summing over all t, our choice of c gives us the conservative upper bound, 



, ' V"' 2 < - 

n 

t>t. 



No bird belongs to more than log n different flocks, so its entire motion is specified 
by the stationary velocities of its flocks plus or minus an additive "vibration" error 
of o(l) on the bird's total displacement. 

Until one of them flips, the flock Fa^ and the one at its sibling node a'j are 
identical copies that have moved in lockstep. The distance between their leftmost 
birds at time tj + n-^ is what it was at time 0, ie, 2^ . We postpone the integrity 
analysis for later and simply assume that the flocks are, indeed, single-paths. This 



implies that the diameter of Fa is at most 2^ — 1. By ( 78 ) , its leftmost edge is of 



length I lb o(l) between time and t^. Since the first two birds always share the 
same flock, the vibration bound above indicates that they always remain within 
distance | + o(l) of each other. The same bound also shows that, at time tj , 
both flocks have diameter at most 2-^ — | + o(l). By our previous observation, 
they must be at distance at least | — o(l). After flipping at time tj + ^ the two 
flocks head toward each otheip^ at a relative speed of 2|maj|, plus or minus an 
error speed that contributes a displacement of o(l). This implies that the time 
between flipping and merging is |(6 it o(l))ma . j^-*^. □ 



For j > 1, we find from Lemmas |4 . 1 1 and 4.2 that 

0, >l^(e^(^^-i^"^)). 



Since, by (71), 9i > n , it follows immediately by induction that, for any j > 1, 



> n 



yj-i, 



(87) 



where, for convenience, we define 9o = 1. This allows us to rewrite our previous 
lower bound in the slightly simpler fashion, 

n > gn(e,-i4-i) 



for any j > 1. Note that the tower-of-twos lower bound on the flocking time 
follows immediately from (88). Indeed, let 6j = \fS~j. By (81), Q\ > 2 and, for 

j > 1, 9j > 2^J~i; therefore, when j reaches logn — 1, 



>9,>2 TT log 



n 
2 ' 



^"^We must assume that the left flock flies to the right, so as to put it on a collision course 
with the other one, after flipping. Our argument is symmetric, however, and would work just the 
same if directions and flip types were reversed. 
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Figure 27: Two flocks merge after a period inversely proportional to their stationary 
velocities. For convenience, we temporarily assume that all flips are of right-type and 
that flocks fly to the right after they are created: these conditions will not always hold. 



which estabhshes the main lower bound of this paper. 



□ 



For future use, we state a weak bound on stationary velocities. By Lemmas 4.1 



and 4.2 for j > 1, 



By (|86j), |m,^ 
that 



7r'^v"-^\ < ||i''^^'||oo < 11'^°^ lb < n 



l-c 



It then follows from (66) 



I ma, I < 



n 



if j = l; 
if j > 1. 



(89) 



It remains for us to prove that stationary velocities never vanish and that the 
flocks keep their structural integrity during their lifetimes. Note that the former 
would not be true if pairs of colliding flocks were mirror images of each other. The 
proof must demonstrate that the symmetries needed for the spectral shift do not 
cause more cancellations than needed. But, first, let us see why the flips conform 
to the noisy model. Both the number of perturbations and their timing fall well 
within the admissible bounds. The only nontrivial condition to check is that the 
change in velocity at flip time t = tj + {j > 1) is l5£*gO(" )_ The £2 norm of 
the change is 

5 



2||P* ^'v''^\\2 < 2Vra||u''-'||2. 



We prove below (117) that 



2 < ma 



Since/ = 3, by (87), 



t = + 01 + ■ ■ ■ + Oj-i < 2{ej-i - n^) ; 
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therefore, by Lemma 



4.2 



6 < 4n|m, 



0{n) 




< 



logt 




e. 




t 



which estabhshes the conformity to the noisy model. 

To conclude the kinematic analysis, we must prove that no stationary velocity 
nia ever vanishes. This is not entirely obvious in view of all the symmetries in the 
system: this would happen, for example, if one flock were the mirror image of its 
sibling. 

Nonvanishing Velocities. We need to take a closer look at the dynamics of the 
system to show that flocks never grind to a halt. In doing so, we will uncover an 
iterated process of period 4 that allows us to give a full description of the velocity 
vector at any time. Again, we assume that all flipping is of type "right," which 
affects only the flocks at right children of height at least 2. 

Theorem 4.3 For any j > 1, the stationary velocity niaj never vanishes. Its 
direction is such that sibling flocks head toward each other to form bigger flocks. 

Proof. For i > 1, define the 2-'-by-2-'~^ matrix 





For example, if j = 3 and 6j = 1 



( 1 



2 
1 
1 

-1 
-1 
-1 




0\ 

1 

1 1 

1 1 

1 

-1 

-1 -1 




1 



-1 
-1 
-1 





V 



-2 -1/ 



By (85), for j > 1 
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and, at the end of its existence, the flock at aj has velocity (with right flips only): 



FjP-ri v"^-' = (Jl Fi) 1 v"' . (90) 



Note that indices run down, as the products are not commutative. By (64), for 
3 > 1, 



2-' 



" (1 1 1 1 



2( 



1 P^T'v"^-i\ 

-1,0, ...,0,1 V 



2(1 - 2i) 



1,0,. ..,0,1) 



(91) 



2(1 -2J) 
where = 1 if j = 2 and 



i=j-l 



23 



We now look more closed 



composition of P-\ By (|83|), for j > 1, 



z,-fc = (l,0,...,0,(-l)'=)^ 
y at the structure of Fj, going back to the spectral de- 



(92) 



where, for notational convenience, we subscript 1 to indicate its dimension; for 
any i > 1 and \ < k <2^ , 



fJ-j^k = (5 + i ^^2T^) ^ ' ^i.fc = 2 if /c < 2-?' and 2i = 1; 



1, cos 



^(fc-l) 7r(fc-l)(2:'-2) 
23 — 1 ' • • • ' 23 — 1 



p2J 



(93) 



Our algebraic approach requires bounds on eigenvalue gaps and on the Frobenius 
norm of Qj^ ■ Note that |/ij,fc| < 1 for all j > 1 and k >2. We need much tighter 
bounds. Recall that n is assumed large enough and deflne /io,2 = 1 for notational 
convenience. 
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Lemma 4.4 For any j > 1, both |/ij,2//i"_i 2 1 \\Qj^ \ \f are less than e ^ ; fi 
j > 1 and k > 2, so is the ratio IfJ-j^k/ f^jM- 



or 



Proof. We leave the bound on \\Q^-^ \\f for last. If j = 1, then /Xj^2 = (—3) and, 



by (87), |/ij,2| < e " . Since /Uo,2 = 1, this proves the first upper bound for j = 1 
Suppose now that j > 1. For 2 < k < 2^ , |l + 2 cos 



view of the fact that j < logn and, by (87), 9j > n , for all > 2, 



^1 < |l + 2cos23^|. In 



< \l^j,2\ < 0{2-^)e 



< e" 



(94) 



By (87), 



< e~ 



Vi-1,2 



The last inequality follows from the fact that 2^ ^3 < |/Xj_i^2| < 1- To bound 
the ratio \fJ-j^k/ fJ-jM for J > 1 and k > 2, we begin with the case j = 2 and verify 

3 

directly that e"" is a valid upper bound. Indeed, 



((^^2 + 1 



if /c = 2; 
if /c = 3; 



Assume now that j, k > 2. Then — 1 < 1 + 2 cos < 1 + 2 cos ^jZi ■ Since 

1 + 2cos27i;t > 1, |1 + 2cos^^Jff^| < |1 + 2 cos 27311; therefore. 



1 + 2 COS 



2lT ' 

2^-1 



1 + 2 COS 



(2 COS 23^-1)^ 



^n(6»j4-j) 



< e~ 



For all j > 1, by (94) and the submultiplicativity of the Frobenius norm, 

\\Q/\\f <2_^\fJ-j,k\ X ||Uj,fe||2 (||%,fc||2 + siki.fc-llb) 
k=2 

<2O0-)|/.,-2| <2O0-)e-""' <e-""\ 



□ 



For J > 1, we express Fj, the "folded" half of Pj^ , by subtracting the lower 
half of 

Uj^k ~ \zj^k—i from its upper half, forming 



Wj-l,k — (Clj • • • 1 '?2J-0"^ ~ \zj-i,k-, 



(95) 



where 



ii = cos 



7r(fc-l)(Z-l) 7r(fc-l)(2J-l+;-l) 



2J~1 



COS 



2J-1 
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It follows from (|82l 1921) that, for j > 1, 



~ 2(1-2^) -^2^^7-1,1 +y^,f^j,kUj,kwJ_i 



k- 



(96) 



k=2 



To tackle the formidable product Hi (90), we begin with an approximation 
Hi Gj, where 



~ 2(l-2J)-^2^^J-l,l + H,'2Uj,2wJ_i 2- 

Setting k = 2, we find that 



1 vr 7r(2-''-2) .V 

l,COS 2731, • ■ ■ , COS ^2j_i , -1 ) 



For < Z < 2^-1, 
This extends to the case j = 1, so that, for any j > 1, 



COS 273Y + COS 21-1 ^ 



Wj,2 = {Ul,-- ■ ,U2j-i,-U2j-i, . . ■ , -Wl)^; 



= COS . 



For k = 2, we simplify into 

?z = COS + sm 2j_i , 

for 1 < / < 2-^^^, which shows that = ^2J-i+i-/j therefore, for j > 1, 



'Wj-1,2 = {Wl, . . . ,W23-2,W2j-2, . . .,wi)'^; 

1 , . 7r/2 
= 2 + sm 2731; 



= COS 



23-1 



+ sm 



2J-1 



(1< Z < 2^- 



By (97),for j>2. 



Gj — I 2(1-2') -'-2'^^^l,l + 



i=j-l i=i-l 



^J■^,2Ui,2W^ 



(97) 



(99) 



(100) 



Expanding this product is greatly simplified by observing that, by (98 99), for 
any j > 1, 



^lihi = wj2Uj,2 = 0; 



4i«i,2 = 2; 



(101) 



^ WJ2I2, =^ 7j, where 2^-1 - 1< 7^ < 2^+^ - 1. 
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To prove the bounds on 7^ , we rely on ( 99 ) , 

2J-1 



7i 



and the fact that 2^11 ^\ < | a nd ^2-1 +^-1' ^ f ' which the two inequahties 

in dToTl) follow readily. By (ffool), for j > 2, 



/=i 



cos 



T3T + sm 



2J+1-1 



23+1-1 / ' 



tt{1-1/2) 



7^ 



i=j-l 



z 

1,1 (n {2(l^)h^'ll,l+^^^,2U^,2wl,^2})Pl'v'''■ (102) 



i=j-l 



If we drop all sub/superscripts and expand the scalar expression above, we find 
a sum of 2^~'^ words zaj-i ■ ■ ■ a2Pi^ v""^ , where each Oj is of the form fiuw or 
Iz (suitably scaled). By (101), however, the only nonzero word is of the form 
A = z{fiuw){lz){fj,uw){lz) ■ ■ ■ P^^ v""^ . This necessitates distinguishing between 
even and odd values of j. 




Figure 28: If j is odd, the word A is of the form 



Case I. {odd j > 2): It follows from (101 ) that 



z^ 



1.1 ( n 



i=j-l 



6 

Zj_iiflj^l^2Uj-l,2Wj__2^2 11 \2(l^:¥)-^2iZi-l,lfJ-i-l,2U' 

odd i=j—2 

3 

2flj-i^2wj_2^2 n {l^h' f^i-^, •2^1-2,2} = (^f'^wf^ 



T 

i-l,2Wi_2^2 



odd i=j—2 
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where 



af'^ = 2(-l)(^-+i)/V2,2 n 

odd i=j—2 



(103) 



One must verify separately that this also holds for the case j = 3, where 77, = 1 



Recall that, by (66 



99), wi^2 = (1;!)"^ and ||f"^||2 = V^n By Lemma 



4.4 



and 



the submultiplicativity of the Frobenius norm, 

|<2Q? < lki,2||2||Q?| 



By ([921 , it follows that 
and 



(104) 



A - 7'^ 



n O'Yi 

i=j-i 

odd( a-i , ai 



Si ai 



V 



(105) 




Case II. {even j > 2): 
2 



1.1 ( n = ^7-1.1 n {/^i.2Wi,2^«j^l,2(2(T3^-^)l2'-i^*-2,l} 

i=j — l oddi=j — 1 

3 

'J-l.l n {( 2(l-2'-l) )/^'.2^».27»-l^»^2,l} = Pj^l,!^ 



oddi=j— 1 
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where 

oddj=j— 1 

It follows that 

2 

^ = ^j-M ( n cO^'f^"^ = pap^^ v^k 

i=j-l 



By (192|, 

t;"! = z?;i(l27r^ + Ql'K^ = zl^Q\- v^^ = /zi,2« - v^^)- (106) 

therefore, 
where 



A = af""Ki -t;^^), (107) 



«r" = (-i)^'^'"'Vi,2 n (108) 

oddi=j— 1 

This concludes the case analysis. Next, we still assume that j > 2 but we 
remove all restriction on parity. Recall that is only an approximation of Fi 
and, instead of (102), we must contend with 

i=j-i 

2 T- 
i=3-\ k=2 

If, again, we look at the expansion of the product as a sum of words 

B = zaj-i ■ ■ ■ a2Pi' v''^ 

then we see that each i?-word is the form 

z{fiuw){lz, fj,uw}{lz, fj.u'w}{lz, fiuw} ■ ■ ■ Pf^ v""^ , 

where /i, u, w are now indexed by k. Recall that previously the only word was of 
the form A = z{ij,uw){1z){/j,uw){1z) ■ ■ ■ P^^ v'^^. There is no need to go over the 
entire analysis again. By showing that \B\ is always much smaller than \A\, we 
prove 

Lemma 4.5 For any 2 < j < logn, 



i=j-l 



(l + e;)(r;r-<^)ar" else, 



where Sn-,s'^ are reals of absolute value 0{e "). 
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Proof. Note that, by (Il0l|, 7i > 2*"^ - 1 for any i > 1. Also, byj66|, v^^ +^2' 



n and v,^ 
2 < j < log n. 



^1 ^,^1 



3n-^ It follows from (103 105 107, 108) that, for any 



\A\ 



-1,1 ( n G.jp^^v'^^ 

i=j-l 



l\c+i j |/i2,2A^4,2 • • •^j-i,2| if j is odd; 



lAii,2/^3,2 • • •Mj-i,2| else. 



(110) 



(111) 



We take absolute values on the right-hand side for notational consistency: all the 
factors, defined in (93), are strictly positive, except for ^1^2 = (—3)"^^ which, 
by (71 ), is equal to — 3"^^ < 0, ie, for i > 1, 

/il,2 < < ^i,2 • 

Let's extend our notation by defining, for i > 1, 

';U.,i = i(l-2^)-^ 

= I2* ) 

Then, any S-word is specified by an index vector {kj-i, . . . , ^2): 

2 



B, 



fe^_l,...,fc2 



w 



Observe that the A-word we considered earlier is a particular i?-word, ie. 



A = B 



2,1,2,1,. 



j-2 



Since we wish to show that all the other i?-words are considerably smaller, we 
may ignore the settings of ki that make a i3-word vanish. All the conditions on 
the index vector are summarized here: 



l<ki<2'] 
kj^i / 1 ; 
^ kih-i / I (2 < i < j) 



By ([93j[95|, for ahi > land A; > 1, ||ni_A;||2 < 2^/^ and for i. A: > 1 
so, by Cauchy-Schwarz, for i > 2 and k,l > 1, 



' ll^i,fc||2 



(112) 



< 2i/2+2. 



wf_,kUi-i,i\ < 2 
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Since 2 < j < log n, 



2 

. n 

i=j-2 



therefore, 



i=j-l 

We prove that all 5- words are much smaller than A in absolute value. 
Lemma 4.6. All B-words distinct from A satisfy: 



\Bi 



'j-i. 



Proof. Since Pi is stochastic, by (|66 



1,^2-^1 



0(n-^) < 1, 



and the upper bound (113) becomes 



(113) 



i=j-l 



(114) 



To maximize the right-hand side of (114), we may replace any instance of 



ki > 2 hy ki = 2 (Lemma 4.4). This does not contradict conditions (112) since no 



index is set to 1. Note the importance for this step of having removed all vectorial 



presence from ( 114 ). We assume that the new P-word is not A, so its index vector 



is not of the form (2,1,2,1,...); therefore, if we end up with this very pattern, and 
hence with A, obviously at least one index replacement must have taken place. 



By Lemma 4.4, any such replacement causes an increase by a factor of at least 



and Lemma 



4.6 



follows. So, we may assume now that ki G {1,2} and 
(/c,_i,A;,_2,...,A;2) / (2,1,2,1,...). 



Scan the string (fcj-i, . . . , A;2) against (2,1,2,1,...) from left to right and let ka 
be the first character that differs. By (112), kj-i = 2, so 2 < a < j — 2; hence 



j > 3. Since we cannot have consecutive ones, ka = 2 and j — a is even. By (110) 



and Lemma 4.4 



\B, 



fc,_l,...,fc2 



< (n' 



c+lj^2 logn 



/^i-2,1 • • • /^a+1,2 Aia,2 Ma-l,fca-i ' " " M2,A:2 I 



< n 



\lJ'j-l,2 fJ'j-3,2 • • • lJ'a+1,2 fJ'a-1,2 /^a-3,2 • • • 
31ogn l/^i-2.1 • • • /^a+2,1 l^a,2 /^a-l,fc„_i " " " ^^2,k2\ 

l/^a-1,2 /^a-3,2 • • • | 
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Figure 30: The top horizontal line represents ki = 1. The white dots below 
the line correspond to ki = 2. The i?-word in white is brought into canonical 
form (black jagged line) by setting all the indices fc^ > 2 to 2. This cannot 
cause the magnitude of B to drop. We may also assume that the end result 
is not the ^-word, as this would cause an exponential growth in line with the 
lemma. 



The first numerator mirrors the index vector of the S-word accurately. For the 
denominator, however, we use the lower bound of (110). The reason we can afford 
such a loose estimate is the presence of the factor fia,2i which plays the central role 
in the calculation by drowning out all the other differences. Here are the details. 
All yu's are less than 1 and, by Lemma 



4.4 



IfJ-a-iM < \fJ'a-i,2\; therefore. 



' j - 1 1 ■ • ■ 1 



fc2l 



31ogn IMq.sI 



31ogn l/^Q.2| 



\A\ 



l.logn I 
\f^a-l,2\ 



l^a-l,2l 



which proves Lemma 4.6 



□ 



There are fewer than n^°s" words; so, by Lemma 4.6 their total co ntribu tion 
amounts to at most a fraction n^og"^e~" of |^|. In other words, by (109), for 
j>2, 

( n ^0^1^ = (1 ± o{e~n)zu,, ( n G.)p^ , 



i=j-l 



i=j-l 



and the proof of Lemma 4.5 follows from (66 105 107). 



□ 



Recall from (91) that, for j > 1, 

1 



2(1 - 2j] 



7^ 



i=j-l 
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Figure 31: We trace the index vectors of the A and _B- words from left to 
right until they diverge (i — a). In this case, j is odd and the index vector of 
the S-word is (2,1,2,1,2,2,1,2,2). 



We know from (101 



103 



108 



111) that neither a^*"^" nor a""^"^ is null. By 



Lemma 4.5 it then follows that the stationary velocity rria. never vanishes for 



j > 2. By (66 74), this is also the case for j = 1,2. To be nonnull is not enough, 



however: sibling flocks must also head toward each other. This is what the flipping 
rule ensures. We next show how. 



Drifting Direction. 



By (66 74), m^^ < < m^i- 



By Lemma 4.5 for j > 2 



1 \{l + en)K'+vT) 
2(1-2^-) hi + e'Jivr - v:^^) 



an^,odd 
,even 



if j is odd; 
else. 



(115) 



We observed i n (|111 ) that /Uj^2 is positive for all j > 1, with the exception of 
//i,2 < 0. By (|l03|7the sign of af'^ is that of (-l)(j+i)/2. On the other hand, 
by (108), the sign of a^^"^" is that of (—1)-'/^. By (66), this proves that, for j > 0, 



the sign of nia^. is positive if and only if j = 0, 1 (mod 4). Remember that this 
is what happens when all the flips are confined to the right children of height 
j >2, what we called right- type flips. The actual rule is more complex. It applies 
to flocks at left children of nodes of odd height at least 3 and to flocks at right 
children of nodes of even height at least 4. We verify that, after the appropriate 
flip, if any, every nia^. is positive, ie, all the flocks along the left spine of the fusion 
tree T drift to the right, as they should. But, before we show this, let's convince 
ourselves that right-type flips alone would not do: indeed, note that niaj < 0, so a 
right-type flip for the right child of would send the two flocks flying away from 



each other (Figure 32 ) . 



Here is a quick proof of the soundness of the true flipping rule. Suppose 
we follow the right-type rule. How do we then modify the velocities to end up 
with the same sign assignment produced by the true flipping rule? The answer is 
simple: reverse the sign of the velocities of the flocks at both children of nodes of 
odd height at least 3. For j > 2, the velocity of the flock at aj will be effectively 
reversed a number of times equal to [(j — l)/2j . The velocity is effectively changed 
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I w I M 

Figure 32: A right- type flip would make the two 4- bird flocks drift away 
from each other. 



only when that number is odd, ie, when j = 0,3 (mod 4). Recall that niaj > 
if j = 0, 1 (mod 4) and j > 0. That implies that vciaj is now positive exactly 
when j = 1,3 (mod 4), ie, j is odd. When j is even, however, the node aj, being 
a left child of an odd-height node, undergoes a flip, which therefore reverses its 
stationary velocity and makes it positive. So, in all cases, rria^. is either positive or 
made positive after the lag time for a flip: the corresponding flock is then headed 
on a collision course with its sibling. Note that, as we observed in the footnote of 



the proof of Lemma 4.2 our previous analysis leading to the tower-of-twos growth 
still holds despite the restoration of the true flipping rule. This concludes the 
proof of Theorem |4.3| □ 



It remains for us to establish the structural integrity of the flocks throughout 
their lifetime. But, before we do so, it is useful to revisit the spectral shift and its 
parity structure. 

The Hidden Periodicity of the Spectral Shift. The formula for the station- 



ary velocity in (115) reveals a built-in periodicity that illustrates a fundamental 



aspect of the spectral shift. Looking at (110), one may wonder why the second 
largest eigenvalues all appear with the same index parity: odd when j is even 
and vice versa. Think of the velocity of a flock as being well approximated by 
crl -|- 7u, where a is the speed of its drift and 7U is its vibration vector pointing 
in the direction of the second right eigenvector scaled by a Fourier coefficient 7 
decaying exponentially fast with time. Take the time to be right before merging 
with the flock's sibling. Then the velocity of the new flock is of the form 

al + 7u 
-al — 7u 

We approximate the transition matrix Pj^ as Ivr^ -|- /ij,2R-) where R is a fixed 
matrix of rank 1. After 9j steps, the velocity becomes roughly (ignoring time- 
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independent factors): 



(1^ + 2R) 1^-^ _ ^1 ~ 7I + f^/^i,2W, 

where w is a unit vector. We ignore the lower-order term /ij,27- It thus appears 
that the pair (cr, 7) becomes (7,(T/Xj^2) for the bigger flock. Note the alternation 
between (cr, 7) and (7,0"). In particular, the switch of 7 from the right to the 
left position in the pair captures the spectral shift underlying the flocking process, 
while the contrary motion of a indicates a re-injection of the first Fourier coefficient 
into the spectral mix. In general, we have the relation ((7^+1,7^4.1) = (7^, (Tj/Xj^2); 
hence, 

(0-j+2,7i+2) = (cTj/ij- 2, 7j^i+l,2)- 

This shows that <Jj+2 = /(^j,2)cj-2, which explains the parity-based grouping 



of (103, 108). Of course, the hard part is to show that none of these terms vanish. 



Note, in particular, that the vector 

T ( 7U 



111' 



-7U 



comes frighteningly close to vanishing. A little bit of symmetry in the wrong 
place is enough to derail the spectral shift. A uniform stationary distribution, for 
example, would destroy the entire scheme; so would a vector u with the same first 
and last coordinates. 



4.3 Integrity Analysis 



We saw in Section |4]T]that the flocks of size 2 and 4 remain single paths during their 
lifetimes. The following result establishes the integrity of all the flocks. Though 
not stated explicitly, the result also asserts that the birds Bi, . . . ,Bn always appear 
in that order from left to right. 

Theorem 4.7 Any two adjacent birds within the same flock lie at a distance be- 
tween 0.58 and 1. This holds over the entire lifetime of the flock, whether it flips 
or not. 

Proof. As is sometimes the case, it is simpler to prove a more complicated bound, 
from which the theorem follows. For notational convenience, put rriag = \n~^ 
and define h{i) as the height of the nearest common ancestor of the two leaves 
associated with Bi and ;Bi+i; eg, /i(l) = 1 and h{2) = 2. We prove by induction 
on j that, for any 1 < j < logn, tj <t < tj+i, and 1 < i < 2^ , 



1 - |(n^ + jn^)|ma^(^j_J < BiSTtiBi,Bi+i) 

fl if i = 2^-1 and i = , , 

I 1 - 4(1 - n)l"^'^hW-il else. 
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Recall that oq, ai, etc, constitute the left spine of the fusion tree T. By (89), the 



upper and lower bounds above fall between 0.58 and 1, so satisfying them implies 
the integrity of the flocks along the spine: indeed, the upper bound ensures the 
existence of the desired edges, while the lower bound greater than | rules out edges 
between nonconsecutive birds. To extend this to all the flocks, and hence prove the 



theorem, we establish (116) for nondeterministic flipping, ie, assuming that any 
node may or may not flip regardless of what the true flipping rule dictates. The 
issue here is that the left spine does not represent all flocks: reversing velocities 



changes the positions of birds irreversibly, so technically we should prove (116) 



not just along the left spine but along any path of T. We can do this all at once 
by considering both cases, flip and no-flip, at each node aj. 

We proceed by induction on j. Before we get on with the proof, we should 



explain why the upper bound of ( 116 ) distinguishes between two cases. In general. 



once two consecutive birds are joined in a flock, they stay forever at a distance 
strictly less than 1. There is only one exception to this rule: at the time t when 
they join, the only assurance we can give is that their distance does not exceed 
1; it could actually be equal to 1, hence the difficulty of a nontrivial upper bound 
when t = tj and i = 2^~^. The case j = 1 is special because two-bird flocks never 
flip but are provided with two different kinds of initial velocities; therefore, we 
must check both {Bi,B2) and (83,84,). We verify (116) directly from (68 70). 
Indeed, for < t < 



n " < X2it) - xi{t) < X4(t) - X3{t) <l+n 



Assume now that j > 2. By applying successively (84 86), Lemma 4.2 and (89) 
we find that 



By (85), 




m, 



'-23- 




The lb leaves open the possibility of a flip of either type, right or left, before the 
2-'~^-bird flocks join at time tj. As we saw earlier, the choice of type ensures that 
the flock with the lower-indexed birds drifts to the right while its sibling, with the 
higher-indexed birds, flies to the left; hence the certainty that, after flipping, the 



"fixed" part of the velocity vector v""^ is of the form |m, 
fact, to achieve just this is the sole purpose of flipping.) 



T 



It follows that 



(In 



■j-i I 



with 



2 < ma 



(117) 
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For 1 <i <2^ , define 



Xi = (0,...,0,-l,l,0,...,0)^ 



23 



By (l83j), for s > 1, 

xJP^ v''^ = nva, xlh. + xlQ', = xlQ- v"^ ; 
lience, for tj < t < tj+i, 

t-tj 

DlSTt{B,,Bi+l) = DlSTt^{B^,Bi+,) + ^(-l)/WxfQj^«., (118) 



s=l 



where f{s) = 1 if there is a flip and s > , and /(s) = otherwise. Note that 
there is no risk in using DiSTj(i3j, instead of the signed version, — 

Xi{t), that birds might cross unnoticed: indeed, the bound in (86) applies to all 
the velocities, so that distances cannot change by more than 0{rr'^) in one step. 
This implies that a change of sign for Xi+i{t) — Xi{t) would be preceded by the drop 



of DlST t{Bi,Bi-\-i) below ^ and a violation of (116). By Cauchy-Schwarz and (84 



117), 



and, since n is assumed large enough, for s > 1, 



■n— r2(s/n^) 



m, 



IxlQiCl < e 



(119) 



Likewise, 



<^1.45g-n(s/n2). 



\nia,_,Wn+\\Q\2} 



For s > 1 and 1 < i < 2^ by (117), 



-n(s/n2) 



(120) 



Recall that j > 2. To prove (116), we distinguish between two cases: whether the 
birds Bi,Bi^i are joined at node aj or earlier. 



Case I. (i = 2^ ^): The edge (i, i + is created at node aj and h{i) = j, where 
2 < j < logn (Figure 33). We begin with the case t = tj. By construction, the 



upper bound in (116) is equal to 1. To establish the lower bound, we observe that 
at time tj — 1 the two middle birds were more than one unit of distance apart. 
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Figure 33: The birds Bi and Bi+i are joined together at time tj. 



By the expression of the velocity given in (117), which expresses the displacement 
prior to tj, neither bird moved by more than (1 + e~")|ma^_j| in that one step; 
therefore, 

DiSTu{Bi,Bi+i) > 1 - 3|ma,„J, (121) 



which exceeds the lower bound of ( 116 ), ie, 1 — |(n^ + jn^)|ma^j.j_^ |. Assume now 
that tj < t < tj+i- Observe that 

The i-th row of Pj is the same as the (2^ + 1 — i)-th row read backwards. This 
type of symmetry is closed under multiplication, so it is also true of PJ. By (83), 
for any s > 0, it then follows that 

The following recurrence relation holds: bf^^ = 1 if 1 < i < 2^^^, and b[^^ = — 1 
else. For s > 0, we get the identities below for / < 2^~^, plus an antisymmetric 



101 



set for / > 2^ 



-1. 



1 

< 

3 



+ 2h^^ 

23+2-1 23+1-1 "23-1 

-b['^ - 2b^i^ 



u(«) 



if / = 1; 

if 1 < / < 23-^; 

if I = 2^-1; 

if / = 2^-1 + 1; 

if 2^-1 + 1< / < 2J; 

if l = 2K 



We find by induction that 



6? > • • • > &2?-i ^ 3" 



therefore, 



T 



-2b^^U < -3" 



(122) 



Since the two middle birds in the flock Fa^ get attached in the flocking network 
at time tj, DiSTtABj, Bj+i) < 1. Assume that Fa^ does not undergo a flip. Then, 



by (117 



118 



119), for tj <t < tj+i, 
t-ti 



t-ti 



s=l 

< 1 + \uia^_,\ ^ xl-iQj{(_\) ® i2^-i} + xl-iQj C 

s=l ^ ^ s=l 

<l-i|m,^_J+J]|x^._iQKI 

S>1 

<l-l|m,^_J + |m,^_jJ^e-i"— 

S>1 

< 1 - i(l - o(l))|m,^_J = 1 - 1(1 - o(l))|m,,(^,_J, 



which proves the upper bound in ( 116 ) for i = 2^^^. The negative geometric series 
we obtain from (122) reflects the "momentum" (minus the vibrations) of the two 
flocks colliding and penetrating into each other's zone of influence before being 
stabilized. 

Suppose now that the flock Fa^ undergoes a flip at time tj + . The previous 
analysis holds for tj < t < tj + n-' ; so assume that tj+nf < t < tj+i. By (120) 
and h{i) = j. 



t-tj-nf 



t—tj—nf 

E 

s=l 



-n(sn-2+n/-2) 



o m, 



«h(i)-l I 
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By (fTTsl), therefore, 



s=nf+l 



s=l 



t—tj —n^ 



s=l 

< 1 - g(l -o(l))|ma,(^)_J +o(|ma^(^,_J) < 1 - i|ma,(,j_J. 



This estabhshes the upper bound in (116) for i = 2^ ^, whether there is a flip or 
not. We prove the lower bound as follows. By (|1 18[ |120 |121 ) , for tj <^ t ^ ^i+i; 



t-i, 



s=l 



> 1 - 3|ma^._J - n^|ma^_J ^( 



-n(s/n2) 



S>1 



> 1 — 71 I m, 



■a,_il 



1 — n |m, 



Note that this derivation still holds if the flock "flips," ie, reverses the sign of 
Q^.v'^J. This establishes ([Tl6| for i = 2^-^. 




Figure 34: The birds Bi and Bi+i are joined earlier than tj 



Case II. {i < 2^-^): This implies that h{i) < j (Figure [34]) . Recall that j > 2. 
We omit the case i > 2^^^, which is treated similarly. The case t = tj follows 
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by inductior^^ for j' = j — 1 and t = tj'+i. Note that t ^ tj-i, so the inductive 
use of (116) does not provide 1 as an upper bound; furthermore it provides even 
stronger bounds, as / < j. We assume now that tj < t < ij+i- By (118, 120), 

S>1 

s>l 



We apply (116) inductively once more for j' = j — 1 and t = tj/^i: 

1 _ + _ i)„4^|m,^^^^_j < DiSTi^.(^„i3,+i) < 1 - i(l - i(j - l))|m,^(^^_J; 

hence, for tj < t < tj+i, 

1 _ 5(^5 ^ _ X)n^)\v^a,^^^_^\ - 0(nVa,-il) < DISTt(^i,^i+i) < 

1 - i(l - ^(J - l))|ma,(,_J + 0(nVa,-J). 



Because j > by (89), |ma^_J < n '^|m(jj^^^j_J, for h{i) > 1. In the case 

h{i) = 1, 



|ma._J < |maj <n " <An |m, 



n 



-11 



for c > 11. This shows that, in all cases, |ma^_^| < 4n~"|m(jj^^,j_^|; hence (116). 
Since sums involving velocities are immediately taken with absolute values, the 
same derivation can be repeated verbatim in the case of a flip. □ 



5 Concluding Remarks 

We have established the first general convergence bound for a standard neighbor- 
based flocking model. We believe that it can be generalized to many of the metric 
and topological variants of the Vicsek model. We have shown that the spectral 
shift underpinning the slow convergence is resistant to noise decaying with time. 
Without temporal decay, injecting a fixed amount of entropy into the system at 
each step is likely to produce widely different behaviors. Whether the techniques 
introduced in this work, in particular the geometric approach, can shed light on 
phase transitions reported experimentally in [3,29] is a fascinating open question. 
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